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Thti  report  lummarlzei  research  performed  at  the  IBM  Thomas  J. 
Watson  Research  Center  during  the  past  two  years  in  the  area  of  automatic 
syntactic  analysis  of  the  Russian  sentence.  The  activities  described  and  the 
results  presented  relate  to  two  distinct  surface  structure  parsing  systems 
for  Russian,  The  main  emphasis  of  the  project  was  on  the  design  and  devel¬ 
opment  of  the  Combinatorial  Syntactic  Analysis  (CSA)  system,  accompanied 
by  an  extensive  program  of  linguistic  research  on  Russian  grammar.  A 
considerably  smaller  effort,  conducted  in  parallel  with  that  on  CSA,  was 
concerned  with  further  work  on  multiple-path  predictive  syntactic  analysis 
of  Russian. 

The  Combinatorial  Syntactic  Ana’yri j  » j otcrr.  i.  exc auative  auto¬ 

matic  sentence  parsing  system  which  produces  surface  structure  descrip¬ 
tions  of  sentences  by  systematically  forming  all  grammatical  combinations 
of  adjacent  pairs  of  constituents  in  a  bottom-to-top,  left-to-right  sequence. 
The  grammars  and  syntactic  coding  accepted  by  the  system  are  written  in 
a  special  metalanguage  in  which  grammatical  constituents  are  treated  as 
structured  symbols  consisting  of  a  part-of- speech  name  followed  by  a  string 
of  tags,  or  attribute /value  pairs.  Because  of  the  extensive  facilities  pro¬ 
vided  by  the  metalanguage  /or  Introducing  tags  and  defining  operations  on 
them,  grammars  develope  !  for  the  CSA  system  can  be  made  significantly 
more  powerful  and  comja:;  than  those  of  the  conventional  IC  type. 

Part  I  of  this  report  describes  the  parsing  algorithm,  the  tag  meta¬ 
language,  and  the  overall  organization  of  the  Combinatorial  Syntactic  Anal¬ 
ysis  system.  More  detailed  accounts  of  the  logical  organization  of  the  CSA 
parser  and  the  associated  Dictionary  Assembly/Update  system  are  included 
in  two  appendices  to  that  section.  A  description  of  linguistic  research  on  the 
CSA  Russian  gr«  mar  is  pres  nted  in  Part  II,  along  with  a  brief  summary 
of  related  language  processing  activities.  In  addition  to  a  presentation  of 
representative  rules  of  the  experimental  Russian  grammar  developed  and  a 
report  on  subclassification  studies,  Part  II  includes  an  extensive  account  of 
further  investigations  of  such  key  topics  in  Russian  grammar  as  apposition, 
predication,  and  coordination 

Part  III,  the  final  section  of  the  report,  summarizes  research  activi¬ 
ties  on  predictive  syntactic  analysis  of  Russian.  The  main  accomplishments 
in  this  area  were  1)  bringing  the  multiple  -oath  predictive  Russian  Syntactic 
Analyzer  into  operational  status  at  IBM  Research,  2}  expansion  of  the  dictio¬ 
nary,  and  3)  terting  and  evaluation  of  tho  performance  of  the  Analyzer  on 
several  thousand  words  of  Russian  text. 
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FVALUATIQN 


1.  Subject  R6D  effort  constitutes  a  transition  trom  the 
limited  environment  syntactic  analysis,  embodied  in  the 
bidirectional  single  pass  technique  of  the  Mark  II  system, 
to  the  sentence  wide  syntactic  analysis  for  an  ultimate 
adaptation  to  a  software  system  for  production  of  syntactic 
level  Russian-to-English  machine  translations  of  scientific 
and  technical  texts. 


2.  The  main  effort  was  directed  toward  development  of  the 
combinatorial  syntactic  analysis  system  for  exhaustive  parsin 
of  Russian  sentences.  It  was  combined  with  a  thorough  lingui 
research  on  the  Russian  grammar  for  comoinatoria  1  syntactic 
analysis.  Subclassification  of  parts  of  speech  according  to 
their  syntactic-semantic  features  deserves  a  special  attentio 
in  this  effort.  Equally  significant  is  the  adoption  of  the 
linguistic  notion  of  "slovoscchetaniye'*  (grammatically  bound 
word  group)  for  a  further  sophistication  of  the  syntactic  re¬ 
cognition  program. 


3.  A  small  scale  research  effort  in  the  predictive  syntactic 
analysis  was  conducted  in  parallel  witn  the  work  on  the  com¬ 
binatorial  syntactic  analysis  svstem,  in  order  to  compare  the 
relative  merits  and  deficiencies  of  both  these  surfact  struc 
parsing  systems.  This  effort  wa>  aimed  primarily  at  bringing 
the  Rui  ian  Predictive  Syntactic  Analyser  into  the  operational 
status.  The  Russian-English  structural  transfer  studv,  intent 
in  this  effort,  did  not  yield  any  significant  results  due  to 


the  fact  that  the  output  from  the 
until  the  end  of  the  contract  period 
tive  syntactic  analysis  has  revealed 
formational  syntactic  recognition. 


.Analyzer  was  not  available 
A .  P.eseaT.  n  en  the  predic- 
the  desirability  of  t rans 
The  report  is  conclude  1 
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appears 


with  the  following  meaningful  statement:  While  the 
of  constructing  a  huge  arra,  of  interlocking  "micro grammars 
in  order  to  handle  texts  of  various  t;‘pes  1 
un  viting  one,  the  possibility  of  construe 
tive  grammar  adequate  for  a  single  specific 
worthy  of  serious  exploration.*' 
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L  THE  CSA  SYSTEM:  ANALYSIS  ALGORITHM,  METALANGUAGE, 
AND  SYSTEM  ORGANIZATION 


1. 0  Introduction 

The  Combinatorial  Syntactic  Analysis  (CSA)  system  is  an  exhaustive 
sentence  parsing  system  which  has  been  implemented  in  the  FAP  language 
on  the  IBM  7094.  When  supplied  with  a  8®t  of  grammar  rules,  together  with 
appropriate  syntactic  alternatives  for  each  word  in  a  sentence,  the  C£A  sys¬ 
tem  produces  all  surface  structure  analyse*  of  the  sentence  that  are  consis¬ 
tent  with  the  rules  of  the  grammar  and  the  syntactic  coding  of  the  words. 

The  analysis  algorithm  builds  up  structural  descriptions  of  a  sentence  by 
systematically  forming  all  grammatical  combinations  of  adjacent  constituent 
pairs  in  a  bottom -to -top,  left-to-right  sequer.ee.  When  the  process  termi¬ 
nates,  following  formation  of  the  final  combination  involving  the  last  alterna¬ 
tive  of  the  last  word  in  the  sentence,  both  complete  sentence  structure  trees 
and  various  intermediate  results  are  retrieved,  edited,  and  printed  out. 

The  grammars  and  syntactic  coding  accepted  by  the  CSA  system  are 
written  in  a  special  metalanguage  in  which  grammatical  constituents  are 
treated  as  structured  symbols  cons. sting  of  a  constituent  name  followed  by  a 
string  of  tags,  or  attribute /value  pairs.  In  their  overall  form,  grammar 
rules  are  currently  limited  to  those  of  the  binary  immediate  constituent  (IC) 
type;  that  ir,  all  rules  are  of  the  °eneral  form  Cj  +  ~  C3,  which  signifies 

that  a  constituent  of  type  Cj  can  be  combined  with  an  immediately  following 
constituent  of  type  C2  to  form  a  constitute  (cr  higher -order  constituent)  of 
type  C3.  Because  of  the  extensive  facilities  provided  by  the  metalanguage 
for  introducing  tags  and  defining  operations  on  them,  however,  grammar 
rules  employed  in  the  CSA  system  car.  oe  made  significantly  more  pewerful 
and  compact  than  those  of  conventional  IC  grammars. 

In  the  work  on  syntactic  analysis  of  Russian,  the  use  of  tags  has  been 
especiailv  valuable  in  dealing  effectively  with  a  variety  of  syntactic  proper¬ 
ties  of  Russian  constructions,  in  particular,  agreement  and  government  re¬ 
lationships  involving  such  attributes  as  case,  number,  and  gender.  More¬ 
over,  since  it  permits  free  introduction  of  tags  in  both  grammar  rules  and 
dictionary  coding  without  changing  the  analysis  program,  the  metalanguage 
provides  a  convenient  vehicle  for  experimental  investigation  of  new  syntactic 
and  semantic  relatione.. .ps.  The  flexibility  of  the  tag  notation  has  also  been 
underscored  by  the  complete  ease  with  which  token  grammars  of  other  lan- 
guates  (English,  German,  and  Hungarian)  have  beer,  accepted  and  applied  by 
the  CSA  system.  Before  presenting  detailed  descriptions  of  the  metalan¬ 
guage  (Section  1.  2)  and  of  the  CSA  system  organization  (Section  i.  ?),  a  brief 
sketch  will  be  given  of  the  analysis  algorithm  employed  by  the  CSA  system. 
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1.1  The  Analysis  Algorithm 


Beneath  an  overlay  of  tag  operations,  which  will  be  discussed  below 
in  Section  1.  2,  the  analysis  algorithm  employed  by  the  CSA  system  makes 
use  of  a  parsing  strategy  which  is  similar  to  one  originally  described  by 
Sakai  (1962),  but  whose  specific  details  are-due  to  Kuno  (1965).  The  general 
flow  of  the  CSA  parsing  algorithm  is  as  follows. 

Before  actual  parsing  begins,  each  word  in  the  text  is  supplied  with 
syntactic  alternatives  by  a  process  of  dictionary  lookup.  These  alternatives 
represent  various  mutually  exclusive  syntactic  properties  of  a  word  form: 
for  example,  PEC6  can  be  a  noun  (’oven5)  or  an  infinitive  form  of  a  verb  ('to 
bake'),  STOL  ('table')  can  be  a  masculine  singular  noun  in  either  the  nomi¬ 
native  or  accusative  case,  and  so  on. 

The  analysis  algorithm  treats  each  sentence  as  a  unit,  processing  the 
syntactic  alternatives  of  its  component  words  from  left  to  right.  The  alter¬ 
natives  are  read  one  at  a  time  into  a  work  area  which  occupies  a  large  stor¬ 
age  matrix.  As  a  given  alternative  is  read  in,  it  is  assigned  boundary 
markers  indicating  what  word  it  spans  in  the  sentence;  for  example,  all  syn¬ 
tactic  alternatives  of  the  third  word  in  a  sentence  are  assigned  the  boundary 
markers  (3,  3).  The  syntactic  alternative  and  its  associated  markers  are 
then  entered  in  the  first  available  row  of  the  storage  matrix. 

At  this  point,  using  the  boundary  markers  as  a  guide,  the  parser 
pairs  the  new  syntactic  alternative  with  each  entry  in  the  matrix  which  re¬ 
presents  a  constituent  that  is  left-adjacent  to  it  in  the  sentence.  As  each 
pair  is  formed,  it  is  looked  up  in  the  table  of  grammar  rules  to  determine 
whether  or  not  its  components  can  legitimately  be  combined  into  a  constitute, 
or  higher-order  constituent.  If  they  can,  a  new  row  is  created  in  the  ma¬ 
trix  for  each  valid  combination.  Each  such  row  contains  not  only  a  pair  of 
boundary  markers,  indicating  the  part  of  the  sentence  spanned  by  the  higher- 
order  constituent,  but  also  a  pair  of  numbers  indicating  the  locations  of  its 
immediate  constituents  in  the  matrix. 

When  all  combinations  of  a  syntactic  alternative  with  its  left-adjacent 
neighbors  have  been  tried,  the  algorithm  moves  on  to  the  following  rows  of 
the  matrix  and  searches  for  additional  combinations  by  successively  pairing 
each  of  the  new  constitutes  with  each  of  its  left-adjacent  neighbors.  As  soon 
as  no  further  rows  remain  to  be  processed,  a  new  syntactic  alternative  is 
read  in,  and  the  above  process  is  repeated.  The  process  terminates  when 
the  program  attempts  to  read  in  another  alternative  at  a  point  where  none 
remains  to  be  processed. 

As  a  simple  illustration  of  the  operation  of  the  parsing  algorithm, 
coneider  the  case  of  a  hypothetical  sentence  whose  five  component  words 
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have  been  assigned  the  (unique)  string  of  syntactic  alternatives  displayed  in 
(1)  and  which  is  to  be  parsed  exhaustively  using  the  grammar  rules  (2). 

(1)  A  N  V  N  N 
1  2  3  4  5 

(2)  A  +  N  =  N 
N+  V  =  S 
N  +  N  =  N 
V  +  N  r  y 

The  course  of  the  analysis  process  can  be  followed  by  examining  successive 
rows  of  Table  I-i,  which  displays  the  final  contents  of  the  storage  matrix 
for  the  grammar  and  sentence  under  consideration.  The  process  begins 
when  the  first  syntactic  alternative  (A)  is  read  into  the  first  row  of  the  pre¬ 
viously  empty  storage  matrix.  Since  there  are  no  entries  in  the  matrix  that 
represent  left-adjacent  sentence  neighbors  with  which  A  can  be  paired,  the 
next  syntactic  alternative  (N)  is  immediately  read  into  row  2.  The  program 
then  searches  for  all  matrix  entries  whose  rightmost  word  number  is  one 
less  than  that  of  the  leftmost  word  spanned  by  the  current  item,  since  this 
is  precisely  the  condition  for  left-adjacency.  The  entry  in  line  1  fulfills  the 
condition;  hence,  it  is  paired  with  the  entry  in  line  2,  forming  the  couple 
(A,  N),  which  is  looked  up  in  the  table  of  grammar  rules. 

The  (A,  N)  pair  matches  the  left  half  of  a  rule  in  the  grammar,  indi¬ 
cating  that  (A,  N)  represents  a  grammatically  permissible  combination.  The 
right  half  of  the  same  rule  indicates  that  the  resultant  constituent  is  an  N. 
The  analysis  program  copies  the  result  in  the  third  row  of  the  matrix,  along 
with  indications  (a)  that  the  new  constituent  spans  words  1  and  2  in  the  sen¬ 
tence  and  (b)  that  it;  components  occupy  rows  1  and  2  of  the  matrix.*  Since 
all  combinations  of  the  entry  in  row  2  with  left-adjacent  neighbors  have  now 
been  exhausted,  the  program  proceeds  to  row  3.  When  no  candidates  are 
found  for  combination  with  that  entry  (at  that  point,  the  last  one  in  the  ma¬ 
trix),  the  program  reads  the  next  syntactic  alternative  (V)  into  row  4.  The 
parsing  process  continues  in  this  fashion  until  all  possibilities  have  been 
exhausted,  yielding  the  storage  matrix  configuration  of  Table  I-i. 


♦  It  should  be  noted  that,  in  dealing  with  actual  gram  mere,  there  may  be 
several  grammatically  acceptable  ways  of  combining  a  given  ordered  pair 
of  constituents.  Such  alternatives  are  represented  by  a  collection  of  sub- 
rules  grouped  under  a  heading  consisting  of  the  constituent  pair  in  question. 
Whenever  a  constituent  pair  in  the  sentence  matches  the  heading  of  such  a 
rule,  it  is  tested  against  all  subrules  of  that  rule,  and  each  one  that  applies 
gives  rise  to  a  new  row  in  the  storage  matrix. 
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Table  X-l. 

Storage  Matrix  for  Combinatorial  Syntactic  Analysis  of  a 

Sample  Sentence 

Boundary  Markers 
of  String  Spanned 

Row  Numbers 
of  Subconstituents 

Row 

Number 

Leftmost 

Word 

Rightmost 

Word 

Left 

Constituent 

Right 

Constituent 

Name  of 
Constituent 

1 

1 

1 

- 

- 

A 

2 

2 

2 

- 

- 

N 

3 

1 

2 

1 

2 

N 

4 

3 

3 

- 

- 

V 

5 

2 

3 

2 

4 

S 

6 

1 

3 

3 

4 

s 

7 

4 

4 

- 

- 

N 

8 

3 

4 

4 

7 

V 

9 

2 

4 

2 

8 

s 

10 

1 

4 

3 

8 

s 

11 

5 

5 

- 

- 

N 

12 

4 

5 

7 

11 

N 

13 

3 

5 

8 

11 

V 

14 

3 

5 

4 

12 

V* 

15 

2 

5 

2 

13 

S 

16 

1 

5 

3 

13 

s 

•Since  it*  behavior  will  duplicate  that  of  the  constituent  in  row  13,  the  con¬ 
stituent  in  row  14  is  prevented  from  entering  into  further  combinations. 


In  processing  the  contents  of  the  matrix  prior  to  final  printout,  the 
program  finds  only  one  "complete"  analysis;  corresponding  to  the  S  in  row 
16(  which  spans  the  entire  sentence.  In  addition,  there  is  t.  partial  analysis 
(corresponding  to  the  V  in  row  14},  the  remainder  of  which  has  been  sup¬ 
pressed  because  it  will  duplicate  corresponding  portions  cf  the  first  analy¬ 
sis.  In  tree  format,  with  row  numbers  indicated  in  par -ntheses  opposite 
each  node,  the  analyses  look  as  follows: 


1.2  The  Tag  Metalanguage 

Much  of  the  power  and  flexibility  of  the  CSA  system  as  a  research 
*ool  is  attributable  to  properties  of  the  metalanguage  in  which  the  grammar 
rules  are  written.  As  has  already  been  noted  above,  a  key  feature  of  the 
metalanguage  is  the  treatment  of  grammatical  constituents  as  structured 
symbols,  each  of  which  consists  of  a  constituent  name  followed  by  a  string 
of  tags,  or  attribute/value  pairs.  Although  tags  are  not  restricted  to  binary 
values,  strings  of  tags  have  obvious  formal  similarities  to  the  syntactic  fea¬ 
ture  vectors  employed  in  recent  formulations  of  transformational  grammar 
(Chomsky,  1965).  However,  the  principal  influences  on  the  o<.  v’el ^pment  of 
the  tag  metalanguage  have  been  two  earlier  systems  that  have  employed 
structured  symbols:  the  COMIT  programming  language  (Yngve.  1961),  with 
its  logical  subscripts,  and  the  grammatical  index  notation  employed  in  mul¬ 
tiple-path  predictive  analysis  of  Russian  (Plath,  1963). 

The  present  metalanguage  shares  with  the  grammatical  index  notation 
the  property  of  being  a  rule-writing  language  in  which  variables  play  an  im¬ 
portant  role,  but  it  is  also  endowed  with  a  COMIT-like  facility  for  ad-lib  in  • 
troduction  of  names  of  constituents,  attributes,  and  values.  The  following 
is  a  detailed  formal  description  of  the  properties  of  the  metal  >ng"»«e. 
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I.  2. 1  Knit  and  Subrule  Format 

Al  tai  notod  in  Section  1.  i,  the  analysts  program  systematically 
teete  each  pair  of  adjacent  conetituente  with  part-of-speech  code*  and 
Cg  againet  all  enbrulee  grouped  under  the  correeponding  heading.  Accord¬ 
ingly,  it  la  necessary  in  preparing  a  grammar  for  the  aystem  to  organise 
the  rules  into  "packet*",  each  containing  the  complete  aet  of  Bubrule*  for  a 
given  ordered  part-of- speech  pair.  Every  subrule  consists  of  the  following 
parts: 

f  ( ( label)))  ( part  of  speech)  ( tag  conditions )  +  ( part  of  speech)  ( tag 
conditions)  *  (part  of  speech)  (tag  replacements  )[( transfer  section)] 

The  above  notation  signifies:  A  subrule  contains  an  optional  label, 
enclosed  in  parentheses,  followed  by  a  part  of  speech  (Cj),  tag  con¬ 
ditions,  a  plus  sign,  a  part  of  speech  (Cg),  tag  conditions,  an  equals 
sign,  a  part  of  speech  (Cj),  tag  replacements,  and  optionally  a 
transfer  section.  The  section  of  the  subrule  between  the  label  and 
the  equals  sign  is  called  the  left  half  of  the  subrule:  the  **»rt  boiw a«m 
the  equals  sign  and  the  transfer  section  is  called  the  right  half  of  the 
subrule.  As  we  have  seen,  the  left  half  of  a  subrule  contains  a  pair 
of  parts  of  speech  with  tag  conditions,  and  the  right  half  contains  a 
part  of  speech  with  tag  replacements.  All  subrules  with  the  same 
ordered  pair  of  part-of-speech  codes  in  their  left  half  must  be 
grouped  together,  and  preceded  by  a  rule  header,  written: 

*(part  of  speech)  +  (part  of  speech) 

A  rule  header,  followed  by  one  or  more  aubrules  having  the  same 
part-of-speech  pair  in  the  left  half  as  was  written  on  the  rule  header, 
is  called  a  rule  packet.  The  following  is  a  sample  rule  packet  taken 
from  a  simplified  grammar  of  German: 

•  PRON  +  VERB 

PRON  SC/PER S  TY/E  CS/NOM  NR/X  +  VERB  SC/FIN  TY/E 

NR/X  •  PRED  SC/PERS  SS/PRVAUX  NR/X 

PRON  CS/X  ♦  VERB  Gl/X  *  VERB  TY/P  G1/2-X  ETC/2 

In  the  rule  packet  just  illustrated,  the  part-of-speech  pair  is  (PRON, 
VERB)  and  this  pair  appears  as  the  two  parts  of  speech  in  the  left- 
half  sections  of  the  illustrated  subrules.  These  subrules  do  not  hap¬ 
pen  to  contain  labels  or  transfer  sections.  However,  each  has  a  dif¬ 
ferent  set  of  tag  conditions,  and  a  different  tag  replacement  section. 
When ever  two  adjacent  constituents  are  encountered  in  the  course  of 
parsing  a  sentence,  and  the  left  constituent  has  part  of  speech  PRON, 
while  the  right  constituent  has  part  of  speech  VERB,  the  rule  packet 
for  PRON  ♦  VERB  will  be  executed.  Any  tub  rules  whose  tag  condi- 


tions  are  mat  will  cause  a  new  constituent  to  bs  generated  whose  part 
of  speech  is  given  by  that  of  the  right  half,  and  whose  tags  are  deter¬ 
mined  by  the  pattern  given  in  the  tag  replacement  section.  For  ex¬ 
ample,  if  the  two  constituents  to  be  combined  are: 

PRON  SC/ PEAS  TY/E  CS/NOM  NR/SING  and 

VERB  TENSE/PRES  NR/SING  SC/FIN  TY/E  Gi/DAT,  ACC  REEL /NO 

then  the  first  subrule  in  the  rule  packet  shown  above  will  be  success¬ 
ful.  The  "variable"  X  will  be  set  to  "SING"  and  the  new  constituent 
dominating  the  above  two  constituents  will  be: 

PRED  8C/PERS  SS/PRVAUX  NR/SING. 

If  the  PRON  given  above  were  marked  TY/G  or  NR/PLUR,  then  the 
conditions  would  not  be  met  and  no  new  constituent  would  be 
written. 

If  the  two  constituents  to  be  combined  are: 

PRON  SC/PERS  TY/E  CS/DAT  NR/PLUR  and 

VERB  SC/INF  TY/E  NR/NONE  TENSE/NONE  G1 /ACC,  DAT 

then  the  second  subrule  will  "succeed"  and  the  following  new  consti¬ 
tuent  will  be  created: 

VERB  TY/P  Gl/ACC  NR /NONE  TENSE/NONE  SC/INF. 

If  now,  immediately  to  the  left  of  the  first  PRON,  there  is  another 
PRON  with  PRON  SC/PERS  TY/E  CS/ACC  NR/SING,  then  this  con¬ 
stituent  will  be  adjacent  to  the  new  constituent  spanning  both  the  first 
PRON  and  VERB,  and  can  combine  with  it  according  to  the  same  rule, 
producing: 

VERB  TY/P  Gl/  NR/NONE  TENSE/NONE  SC/INF. 

1.  2.  2  Data  Tags  and  Data  Constituents 

During  the  course  of  analysis,  any  two  adjacent  constituents  are 
tested  against  the  rui^  packet,  if  any,  for  the  ordered  part-of-speech  pair 
which  they  constitute.  If  there  are  successful  aubrules  in  this  rule  packet, 
new  constituents  spanning  the  range  of  word  numbers  spanned  by  both  origi¬ 
nal  constituents  are  created.  The  initial  constituents,  which  must  be  pres¬ 
ent  before  any  constituents  can  be  generated  by  application  of  a  subrule,  are 
the  syntactic  alternatives  assigned  by  dictionary  lookup  to  each  of  the  words 
of  the  sentence.  These  constituents  normally  have  no  "subconstituents''  (i.e., 
pairs  of  constituents  from  which  a  constituent  is  generated  by  application  of  a 
rule  and  which  are  the  eubnodes  of  the  constituent  in  the  generated  tree)  and 
are  of  unitary  word-span.  The  constituents  which  are  matched  against  the 
rules  during  p*r«ing,  whether  they  come  from  the  original  input  data  or 


whether  they  here  been  created  by  previous  applications  of  grammar  roles, 
are  generally  called  data  constituents  and  consist  of  a  par»-of-epeech  same 
plus  a  number  of  tags,  which,  in  order  to  distinguish  them  from  the  simi¬ 
larly-written  components  of  rules  called  tag  conditions,  are  sometimes 
called  data  tags.  Hence  the  tag  metalanguage  in  whli'i  rules  are  written  is 
a  language  containing  tag  conditions  and  tag  replacements  for  matching  and 
generating  data  tags.  ~ 

The  form  of  data  constituents 

Data  constituents  have  the  following  {externally  printed  or  punched) 
form:  (part  of  speech)  ( tags). 

The  part  of  speech  has  the  form:  (  symbol),  where  a  symbol  is  any  string 
of  six  or  fewer  consecutive  alphanumeric  characters,  the  first  of  which  is 
S4V  •  «A  aaC  alphabetic  characters  are: 
ABCDEFGHIJKLMNOPQRSTUVWXYZ* 

The  numeric  characters  are: 

0123456789 
The  following  are  leg?l  symbols: 

PRON  ABC1  G3*Q 

The  following  are  not  legal  symbols: 

3A  (first  character  not  alphabetic) 

AB  C  (characters  separated  by  a  blank) 

PRONOUN  (more  than  six  characters) 

AB(C  (contains  non-alphanumeric  character) 

The  tags  field  is  a  liet  of  zero  or  more  individual  tag. ,  separated  by  blanks. 
Each  tag  has  the  following  form:  (  attribute  )/  (  value),  'vhere  a  value  has  the 
form:  *  or  $  or  ( constant  symbol ) 

or  (  constant  symbol),  ...  (  constant  symbol) 
where  an  attribute  is  any  symbol,  and  a  constant  symbol  is  at.j  symbol  be¬ 
ginning  with  a  letter  from  A  through  W. 

The  following  arc  valid  data  constituents: 

ART 

PR  ON  CS/NOM 

VERB  Gi/*  G2/ACC,  DAT  TENSE/PRES  NUMB/PLUR  PERSON/! 
NOUN  TY, PHRASE  LEFT/NO  RIGHT/51  CASE/ADESS. 

The  following  are  not  valid  data  constituents: 

ART/DE F  (part  of  speech  must  be  present) 

ART  DEF/YES  (a  constant  must  not  begin  with  X,  Y.  Z.  or  *) 

NOUN  CASE /Si,  $  (the  dollar  sign  must  stand  alone) 

/ERB  TYP/E,  TENSE/PRES  (the  tag*  must  be  separated  by  blanks, 

not  commas) 

NOUN  Gi/  CASE/ALLAT  (some  value  must  be  given  --  the  *  has  the 

semantic  meaning  of  a  null  value  and  should 
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bs  punched;  it  will  not  be  printed) 

NOUN  PERSON/ 3  NUMB /SING  (a  numeral  i»  not  a  symbol  as  defined 

above) 

i.  2.  3  Subrule  Tags  and  Subrule  Constituents 

As  was  mentioned  earlier,  rules  are  organized  into  x  ule  packets, 
consisting  of  one  or  more  subrules  with  a  common  part-of- speech  pair. 
Furthermore,  each  subrule  consists  of  an  optional  label,  a  left  lialf,  an 
equals  sign,  a  right  half,  and  an  optional  transfer  section.  The  left  half 
contains  a  part-of-speech  code  with  associated  tag  conditions  for  each  of 
the  two  constituents  combined  by  the  subrule. 

The  form  of  tag  conditions  (left  half) 

Tag  conditions  have  the  following  form: 

( tag  conditions )  *  (  empty ) 
or  (tag  condition)  [. . .  ( tag  condition)] 

(tag  condition)  *  (test  attribute)/ (  specifier) 

(  specifier )  *  * 

or  (  constant  symbol )[,  (  constant  symbol)  .  . .  ] 
or  (variable) 

or  (variable)  -  (constant  symbol)[,  (constant  symbol)...] 
or  ((  constant  symbol)' 
or  ((variable)) 

(variable)  *  any  symbol  beginning  witn  X,  Y,  or  Z. 

When  a  subrule  ts  teeted,  it  is  known  that  the  two  data  constituents  have  the 
part-of-speech  codes  specific  1  in  the  subrule,  for  only  that  rule  packet 
headed  by  the  appropriate  part  of-speech  pair  is  executed  when  the  two  ad¬ 
jacent  constituents  are  matched  against  the  grammar.  The  tag  conditions 
on  the  subrule  are  executed  sequentially,  and  earh  may  either  succeed  or 
fail.  When  a  tag  condition  succeeds,  the  next  tag  condition  is  tried,  until 
all  tag  conditions  have  been  tested  successfully,  in  which  case  the  subrule 
is  said  to  succeed.  When  a  tag  condition  fails,  no  more  tag  conditions  are 
tried  and  the  eubrule  is  said  to  fail.  In  addition  to  causing  a  subrule  to  suc¬ 
ceed  or  fail,  tag  conditions  can  have  side  effects.  Th«.  principal  side  effect 
of  a  tag  condition  is  variable -setting,  since  this  affects  a  future  tag  condition 
or  replacement  within  the  subrule.  The  following  sre  the  different  types  of 
tag  conditions  tabulated  according  to  the  type  of  specifier  which  can  appear 
in  them. 

Tag  conditions  classified  according  to  specifier  type 

I.  *:  If  the  teet  attribute  is  not  present  on  the  data  constituent  being  tested, 
or  if  the  attribute  is  present  but  has  no  value,  then  the  tag  condition 
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succeeds.  If  the  attribute  ie  present  with  sernv  value,  the  tag  condition 

telle. 


2.  (  constant  symbol):  If  the  test  attribute  is  present  on  the  data  constitu¬ 
ent  a  no  die  constant  symbol  appears  in  its  value  field,  then  the  tag  con 
dition  succeeds;  otherwise,  it  fails. 

3.  <  constant  symbol), . . .  {  constant  symbol):  If  the  test  attribute  is  present 
on  the  data  constituent  and  at  least  one  of  the  constant  symbols  in  the  list 
appears  in  its  value  field,  the  tag  condition  succeeds:  otherwise,  it  fails. 

4.  ( variable):  If  the  test  attribute  is  not  present  on  the  data  constituent, 
the  tag  condition  fails.  Otherwise,  there  are  two  casee: 

Case  a;  If  this  is  the  first  occurrence  of  the  variable  in  the  subrule, 
then  the  variable  must  be  defined.  This  is  accomplished  by  assigning 
the  variable  the  entire  contents  of  the  value  field  of  the  test  attribute  on 
the  data  constituent.  (Such  an  assignment  may  be  modified  during  fur¬ 
ther  processing  of  the  current  pair  of  data  constituents  in  accordance 
with  the  subrule  in  question.  In  any  event,  it  does  not  remain  in  force 
after  completion  of  the  processing  of  the  current  constituent  pair  ac¬ 
cording  to  that  subrule.) 

Case  b:  If  this  is  not  the  first  occurrence  of  the  variable  in  the  sub- 
rule,  it  has  previously  been  defined  during  this  application  of  the  subrule 
to  the  current  data  constituent  pair.  Delete  from  the  defined  value  of  the 
variable  all  items  which  do  not  appear  in  the  value  field  of  the  test  attri¬ 
bute  on  the  data  item.  If  nothing  remains,  the  tag  coition  fall*.  If  one 
or  more  values  remain,  then  these  are  precisely  the  value*  -ommon  to 
both  the  data  constituent  attribute  from  which  the  variable  was  defined 
and  the  data  constituent  attribute  currently  being  tested,  and  the- tag  con¬ 
dition  succeeds. 

o.  (variable)  -  (constant  symbol)],  (constant  symbol/...]:  First,  as  in 
4,  either  the  variable  is  defined,  or  the  already  defined  variable  ie 
tested.  In  addition,  all  constants  appearing  on  the  list  after  the  minus 
eigu  in  the  rule  tu  «!so  del-ted  from  the  definition  of  the  variable.  If 
nothing  remains,  the  tag  condition  fails.  If  one  or  more  slue*  remain* 
they  are  guaranteed  (in  the  absence  of  repetitions)  to  be  other  than  those 
'n  .he  list  of  constants  after  the  minus  sign  (the  "exclusion  list"),  and 
the  t£«  condition  succeeds. 

6.  ({  constant  symbol));  The  same  as  ?.  above,  provided  that  the  test  attri¬ 

bute  ia  present  on  the  data  constituent.  Unlike  2,  however,  if  the  test 
attribute  is  not  present,  the  tag  condition  does  not  fail,  and  execution 
of  the  subrule  continues. 

?.  ((variable)):  The  same  at  4  above,  except  that  the  tag  condition  cannot 
fail  unless  the  teat  attrtbuta  ia  present  on  the  data  constituent.  U  the 
attribute  does  not  appear,  the  variable  is  defined  with  null  vaiue,  put 
execution  of  tat  subrule  continues. 


Not*:  When  the  $  appear*  on  a  data  constituent  a*  the  value  of  an  attribute, 
it  matches  any  constant  from  a  tag  condition  of  type  (2)  above,  as  well 
a*  any  constant  referred  to  by  a  variable  in  a  tag  condition  of  type  (4). 
When  •  appears  as  the  value  of  an  attribute  on  a  data  constituent,  it 
represents  a  null  value  and  hence  can  only  satisfy  a  tag  condition  of 
typo  »*). 

Right  half  of  a  subrule 

When  a  subrule  is  applied,  its  tag  conditions  are  executed  interpre- 
tively  a*  instructions  to  make  tests  and  to  set  variables.  If  the  subrule 
fails,  no  new  constituent  is  created.  If  the  subrule  succeeds,  a  new  consti¬ 
tuent  is  created  according  to  the  specifications  of  the  part  of  speech  and  tag 
replacement  section  of  the  right  half  of  the  subrule.  The  part  of  speech  in 
the  right  half  of  a  rule  can  be  either  a  constan*  symbol  (in  which  case  that 
constant  becomes  the  part  of  speech  of  the  rewritten  constituent),  or  a  vari¬ 
able  (in  which  case  the  value  defined  for  that  variable  --  which  must  be  re¬ 
stricted  to  a  single  value  —  becomes  the  part  of  speech  of  the  rewritten 
constituent).  After  the  part  of  speech  come  the  tag  replacements,  each  of 
which  has  the  form: 

{ attribute )  /  (  replacement  specifier ) 
or  ETC/1  or  ETC/2  or  ETC/1, 2 
The  attribute  must  be  a  constant  symbol;  the  permissible  tag  replacement 
specifiers  with  their  interpretations  are  as  follows. 

Tag  replacements  classified  according  to  specifier  type 

1.  {  constant  symbol)  or  {  constant  symbol), .  . .  (  constant  symbol).  The 
attribute,  along  with  the  constant  symbol(s)  specified  in  the  subrule, 
la  added  to  the  tags  field  of  the  constituent  being  generated. 

2.  •:  The  attribute,  with  no  value,  is  added  to  the  tags  field  of  the  consti¬ 
tuent  being  created. 

3.  (variable)  [,  (  constant  symbol),  . . .  J  :  The  attribute  with  a  va,u«  con¬ 
sisting  of  th«  defined  value  of  the  variable  plus  the  constant  symbols,  if 
any,  is  added  to  the  tags  field  of  the  generated  constituent. 

4.  { C-,  meral)  -  (variable):  The  numeral  is  either  1  or  2,  referring  to  the 
left  subconstituent  (Cf)  or  the  right  subconstituent  (C^).  The  given  at¬ 
tribute  it  looked  for  on  Cj  or  C^,  and  its  value*  are  copied  onto  the 
created  constituent  as  values  of  the  gtv*n  attribute,  except  for  such 
valu*  *  as  are  part  of  the  definition  of  the  variable,  which  are  not  copied. 
It  is  possible  to  generate  tu  attribute  with  no  values  using  tnis  replace  - 
meat  specification,  atnee  ail  the  values  from  the  or  tags  might 
be.  among  the  of  the  given  variable.  This  specification  is  especially 
useful  in  *  erasing"  a  matched  value  from  a  tag  specifying  a  list  of 


Mifbl*  m* tches.  For  example,  if  the  Cj  contain*1  GOV/ A,  3,  C  and  Cj 
contains  TYPE/B,  then  the  subrule 
Ct  . . .  GOV/X  ♦  C2  . . .  TYPE/X  *  C3  . . .  GOV/i-X 
will  create  a  constituent  whoje  new  GOV  has  the  values  A,  C. 

5.  The  tag  replacement  has  the  form  ETC/i  or  ETC/2  or  ETC/1,  2:  All 
the  tags  from  data  constituents  C^,  C,,  or  C*  and  C,  are  copied  onto 
the  newly  generated  constituent  with  the  exception  ofthose  whose  attri¬ 
butes  have^been  previously  mentioned  anywhere  in  the  subrule.  If  the 
user  wishes  to  copy  a  tag  whose  attribute  appears  previously  in  the  sub¬ 
rule,  he  must  repeat  It  explicitly  in  the  right  half. 

1.  2.  4  Examples  of  Subrulcs  and  Their  Application 

Let  us  ass'  me  that  the  left  data  constituent  to  be  matched  against  a 
grammar  eubrule  is: 

A  SLBCLS'B  CASGOV/D  ANGOV/NO  FORM/N  SPECA/SEN 
SPECB/REM  TYPE/NX S  SPECC/LIM  SPECD/DEV  WORD/F3545 

and  the  right  data  constituent  is: 

B  SUBCLS/K  CASE/D  AN/NO  FORM/W  SPECA/REM  SPECB/FIB 
TYPE/NIS  SPECC/KIM  SPECD/GOOB  WORD/E29H 

Assuming  the  subrule  is 

A  FORM/X  +  B  SUBCLS/M  «  C  FCRM/X  SUBCLS/M  ETC/2 

the  subrule  will  fail,  because  the  tag  condition  SUBCLS/M  is  not  met,  since 
B  >e  coded  with  SUBCLS/K.  If  B  were  coded  with  SUBCLS/M,  then  the  sub- 
rule  would  succeed  with  variable  X  set  to  N.  The  new  constituent  would  be 
C  FORM/N  SUFCLS/M  CASE/D  AN/NO  SPECA/REM  SPECB/FIB 
TYPE/NIS  SFECC/KIM  SPECD/GOOB  WORD/E2961 

Notice  that  it  was  necessary  to  specify  SUBCLS/M  on  the  right  half  as  well 
in  order  to  have  the  SUBCLS  copied.  The  ETC  specification  would  not  copy 
it,  since  SUBCLS  had  already  appeared  in  the  subrule. 

If  the  subrule  is: 

A  SJBCLS/E  CASGOV/X  FORM/Y  +  B  CASE/X  FORM/Y  = 

C  SUBCLS/P  FORM/Y  CASGOV/l-X 

then  the  subrule  fails,  since  Y  is  set  to  N  from  the  C^  constituent,  and  fails 
to  "agree”  on  the  C2  constituent.  If  the  C2  had  FORM/N  instead  of  FORM/  W 
then  the  subrule  world  succeed  with  X  set  to  D  and  Y  set  to  N.  The  created 
constituent  would  be  C  SUBCLS/P  FORM/N  GASSOV/*  (the  *  will  not 
print). 

If  the  subrule  is: 

A  SUBCLS/X1  -B  SFECB/X  SPECC/Y  SFEGD/3  +  B  ILLEG/* 
SPECA/X*  C  SPECA/*  SPECB/Z  SPECC/W1  SPECD/W2 
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then  the  aubrule  fails,  since  XI  will  be  first  set  to  B,  and  then  the  B  will  be 
eliminated  from  the  definition  of  XI,  leaving  a  null  definition,  which  fails  the 
subrule.  'Tie  meaning  of  such  a  tag  condition  is  "any  XI  except  B".  If  A 
had  SUBCLS/C,  then  the  subrule  would  succeed,  with  XI  set  to  C,  X  to  REM, 
Y  to  LXM,  Z  to  DIV,  and  the  generated  constituent  would  be: 

C  SPECA/LIM  SPECB/DIV  SPECC/Wi  SPECD/W2 

If  the  B  constituent  had  the  tag  ILLEG/PLUS,  then  the  subrule  would  fall, 
since  the  tag  condition  ILJLEG/*  requires  the  absence  of  tag  ILLEG  or  a  null 
value  on  it. 

1.  2.  5  Order  of  Execution  of  Subrules 

For  a  given  pair  of  parts  of  speech,  control  is  transferred  to  a  rule 
packet  when  two  adjacent  constituents  are  found  during  analysis.  Control 
begins  in  a  rule  packet  by  executing  the  first  subrule  in  the  packet.  At  the 
end  of  execution  of  a  subrule,  the  subrule  has  either  succeeded  (and  gener¬ 
ated  a  new  constituent),  or  failed.  Flow  of  control  after  this  point  is  deter¬ 
mined  by  whether  the  subrule  has  succeeded  or  failed,  and  what  the  user 
has  indicated  (if  anything)  in  the  transfer  section  of  hiB  subrule.  The  user 
may  specify  the  next  subrule  in  the  packet  to  be  executed  in  case  the  subrule 
succeeds,  and  the  next  subrule  in  the  packet  to  be  executed  in  case  the  sub¬ 
rule  fails,  by  including  a  transfer  section  in  his  subrule.  The  transfer  sec¬ 
tion  has  the  form:  (S/  (  symbol)  Ff  (  symbol)).  The  S/  {  symbol)  is  called 
the  success  transfer;  the  F/  ( symbol)  is  called  the  failure  transfer.  The 
success  transfer  and  failure  transfer  may  appear  in  either  order;  one  or  the 
other  or  both  may  be  absent;  if  both  are  absent  the  enclosing  parentheses 
must  also  be  absent.  The  (  symbol)  must  appear  in  the  label  section  of 
another  subrule  in  the  same  packet,  otherwise  the  symbol  is  undefined,  and 
an  error  message  is  printed  during  grammar  compilation.  Exception:  If 
the  sym’  is  "QUIT",  it  does  not  refer  to  another  labelled  subrule  in  the 
rule  pacfc  .  but  indicates  that  no  more  subrules  are  to  be  executed.  After 
a  subrule  is  executed,  the  next  subrule  to  be  executed  is  the  one  whose  label 
appears  in  the  success  transfer  of  the  current  subrule  if  the  subrule  has 
succeeded  and  is  the  one  whose  label  appears  in  the  failure  transfer  if  the 
subrule  has  failed.  If  the  transfer  symbol  is  "QUIT",  no  more  subrulea  are 
tried,  and  the  matching  process  for  this  pair  of  constituents  is  complete.  If 
the  transfer  symbol  has  not  been  specified  by  the  user,  control  either  goes 
to  the  next  consecutively  written  subrule  In  the  rule  packet,  or,  if  this  is  the 
last  subrule  in  the  ’ule  packet,  the  rule  is  terminated  (like  "QUIT").  Hence, 
the  user  needs  to  specify  labels  and  transfer  symbols  only  in  those  cases  in 
which  he  wishes  to  depart  from  the  normal  sequential  execution  of  the  sub- 
ruljs  of  a  rule  packet.  The  user  must  take  care  that  a  subrule  does  not 
transfer  back  to  itself  and  that  flow  of  control  does  not  result  in  any  subrule 
being  executed  more  than  once. 


ffittgtratioa  of  flow  aK  ciattot 

A  TT/A  3T/X  4  B  TY/K  ST/X  -  C  SPXC/rORM  ST/X  (3/RULE  3) 
A  4  B  *  D 

(R&LS3)  A  TT/X  4  B  TY/X  *  E  (F/QUIT) 

A  +  B  «  D  8T/K 
A  4  B  -  C  CL/W 

The  very  specific  first  subrule  tests  for  an  exceptional  set  of  tags  on 
A  and  B  and  generates  a  new  constituent  <U  If  A  and  B  are  not  this  excep¬ 
tional  case,  then  control  goes  to  the  next  consecutive  subrule,  which  has  no 
tag  conditions  and  hence  must  succeed,  producing  a  D.  If  the  exceptional 
case  has  occurred,  we  do  not  wish  to  produce  a  D,  so  the  S/RULE3  causes 
a  transfer  of  control  to  the  third  subrule,  RULE3.  Another  test  is  made 
(for  agreement  of  TY)  on  A  and  B.  If  the  subrule  succeeds,  we  wish  to  gen¬ 
erate  a  new  constituent  E,  but  we  also  wish  to  generate  D  ST/K  and  C  CL/W 
as  well.  Hence  we  allow  control  to  continue  to  the  next  two  subrules  with  no 
tag  conditions,  just  in  case  RULE3  succeeds.  If  RULE3  fails,  however,  we 
do  not  wish  to  generate  these  extra  constituents,  and,  since  there  are  no 
more  subrules  in  this  rule  packet,  the  failure  transfer  is  QUIT. 


1.3  CSA  System  Organisation 

The  overall  organisation  of  the  CSA  system  is  displayed  schematical¬ 
ly  in  Figure  I-i.  The  system  comprises  a  parsing  package  (The  Combina¬ 
torial  Syntactic  Analyzer  proper),  a  dictionary  assembly/update  package,  a 
dictionary  lookup  routine,  and  miscellaneous  routines  for  input  and  output 
processing  and  linguistic  support.  Brief  general  descriptions  of  these  four 
components  are  given  below  in  1.  3. 1-1.  3.  4,  respectively.  A  detailed  de¬ 
scription  of  the  internal  organization  of  the  CSA  parsing  package  is  presented 
in  Appendix  I-A;  the  Dictionary  Aesembly/Update  package  is  described  in 
similar  fashion  in  Appendix  I-B. 

i.  3. 1  Combinatorial  Syntactic  Analysis  Routines 

The  automatic  parsing  package  developed  under  the  present  contract 
consists  of  twenty-four  basic  routines  in  the  form  of  relatively  small  modules. 
Each  basic  operation  within  the  package  is  assigned  to  a  specific  module,  so 
that  all  direct  accesses  to  a  given  data  storage  area  used  in  parsing  are  made 
exclusively  by  the  module  which  has  the  competence  to  access  that  area.  This 
organization  makes  it  possible  to  confine  the  effects  of  system  changes  to  in¬ 
dividual  modules,  rather  than  involving  the  entire  program  package. 

The  parsing  process  is  divided  into  five  major  steps,  die  last  three  of 
which  are  repeated  for  each  sentence  analysed: 


1. 


All  tuljrili  routine*  arc  loaded  Into  memory  and  the  various  work  area* 
are  laid  but. 

2.  The  grammar  rale*  are  read  in  and  stored  In  various  tables  for  subse¬ 
quent  matching  against  syntactic  alternatives  of  input  text  words  and 
higher-order  constituents. 

2.  The  syntactic  alternatives  for  each  word  in  an  input  sentence  are  read 
into  the  computer. 

4.  The  sentence  is  parsed  exhaustively  according  to  the  CSA  algorithm 
(Section  1. 1),  with  intermediate  results  accumulated  in  a  storage 
matrix 

5.  The  matrix  is  searched  for  complete  analyses  of  the  sentence.  Each 
such  analysis  is  printed  out  in  the  form  of  a  tree  whose  nodes  consist 
of  constituent  names  and  their  associated  grammatical  tags. 

The  flow  of  control  of  the  parsing  program  can  be  observed  with  re¬ 
ference  to  Figure  1-2.  After  the  routines  have  been  loaded  in  memory  and 
the  work  areas  and  tables  have  been  laid  out,  the  main  program,  which  per¬ 
forms  the  actual  parsing,  immediately  branches  to  the  monitor.  The  moni¬ 
tor  examines  all  control  cards  specifying  the  different  tasks  of  the  system, 
such  as  compilation  of  grammar  rules,  printing  cut  of  messages,  and  reading 
in  of  sentences  to  be  analysed. 

Assuming  that  the  first  card  is  a  $  GRAMMAR  control  card,  the  moni¬ 
tor  directs  all  grammar  cards  to  the  grammar  compiler,  whose  task  is  to 
set  up  the  grammar  rules  for  subsequent  matching  with  the  input  text  consti¬ 
tuents.  For  each  card  beginning  with  (a  group  heading  card),  the  com¬ 
piler  store*  the  BCD  names  for  the  rule  constituent  pair  in  unique  locations 
by  calling  a  hash  addressing  routing  HASH,  which  produces  an  address  pair. 
This  address  pair  is  next  stored  in  an  entry  in  the  table  of  grammar  rules 
RUL.TAB,  the  entry  address  being  found  by  hashing  the  address  pair  itself. 
The  RULTAB  entry  points  to  the  first  location  in  the  subrule  table  TAGTAB 
where  the  corresponding  subrules  will  be  stored.  The  rule  compiler 
GRAMAR  then  compiles  all  subsequent  grammar  cards  with  that  heading  (the 
subrule  cards)  into  consecutive  locations  of  TAGTAB  in  the  form  of  a  list 
structure. 

When  GRAMAR.  reaches  a  control  card,  it  returns  to  the  monitor.  If 
the  card  contains  the  command  $SENTENCES,  control  is  transferred  to  the 
sentence  read  routine  SFNTNC,  which  reads  a  sentence,  each  word  of  which 
is  coded  on  a  word  card  (signalled  by  *  in  column  one)  followed  by  cards  con¬ 
taining  its  syntactic  alternatives.  SENTNC  uses  HASH  to  convert  the  part- 
of-speech  code  and  tags  of  each  syntactic  alternative  to  the  addresses  of  the 
locations  where  their  BCD  representations  are  stored  and  records  the  ad¬ 
dresses  in  a  table  in  a  common  storage  area. 
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When  SENTNC  encounters  a  card  with  two  asterisks  separated  by  a 
space,  it  interprets  this  as  an  end-of-sentence  signal.  Control  returns  to 
the  monitor  and  from  there  to  the  main  program  ANAL.YZ,  which  now  is 
ready  to  perform  the  parsing. 

Available  to  this  main  program  is  the  table  in  the  common  storage 
area  whose  entries  point  to  the  syntactic  alternatives.  These  entries  are 
copied  one  at  a  time  into  a  large  storage  matrix  BIGTAB;  the  program 
matches  all  adjacent  pairs  of  constituents  against  rules  in  the  grammar 
table,  creating  a  new  constituent  whenever  paired  constituent  names  match 
and  the  tag  conditions  succeed. 

The  matching  procedure  is  performed  by  feeding  adjacent  constitu¬ 
ents  to  a  rule  matching  and  replacement  routine  GETRL  which  returns  (a) 
the  number  of  higher-order  constituents  (possibly  zero)  that  can  be  formed 
for  this  pair  according  to  the  rules  of  the  grammar  and  (b)  the  address  of 
the  first  of  these  constituents.  The  main  program  ANALYZ  copies  ail  these 
newly  created  constituents  into  BIGTAB  and  proceeds  with  analysis  until  the 
end  of  the  sentence  has  been  reached. 

When  the  main  program  has  finished  analyzing  a  sentence,  it  trans¬ 
fers  to  a  sentence  structure  retrieval  and  editing  program  PRANS  whose 
task  is  to  retrieve  the  results  of  the  analysis  according  to  an  option  control 
card  and  print  them  out.  The  first  operation  normally  performed  by  PRANS 
is  to  print  out  the  results  of  the  analysis  in  matrix  format  by  calling  a  sub¬ 
routine  DB1G  which  edits  and  prints  each  raw  of  the  matrix  specified  in 
BIGTAB,  giving  for  each  constituent  its  word  boundaries,  subconstituent  ad¬ 
dresses,  and  tags.  Next,  PRANS  scans  BIGTAB  for  constituents  which  span 
the  entire  sentence,  and  which  have  constituent  names  agreeing  with  a  set  of 
permissible  sentence  structure  nodes.  PRANS  then  prints  out  the  tree  for 
the  sentence  nodes  which  it  locates,  recording  the  .-onstituent  name  and  tags 
for  each  node.  According  to  an  option  which  may  be  selected  by  die  user, 
trees  representing  the  distinctive  portions  of  analyses  suppressed  because 
of  their  partial  duplication  of  previous  analyses  may  also  be  retrieved  and 
printed  out. 

After  the  PRANS  has  processed  all  the  permiasible  analyses,  a  test 
is  made  to  determine  whether  more  sentences  are  to  be  processed.  If  so, 
control  returns  to  the  sentence  read  program,  and  the  process  continues  as 
before.  If  no  sentences  remain,  control  is  transferred  to  th*  monitor.  A 
final  exit  is  made  when  the  monitor  reeds  a  $EXIT  control  card. 

i.  12  Dictionary  Assembly  /Update  Package 

The  Dictionary  Assembly /Update  package  consists  of  eight  baste 
modules  as  well  es  the  Sort/Merge  system  and  input /output  tape  control 


routines,  The  package  was  designed  in  modular  fashion  in  order  to  facilitate 
modification  of  and  addition*  to  the  total  system.  As  in  the  CSA  system, 
each  basic  operation  within  the  program  is  assigned  to  a  specific  module. 

The  system  Is  entered  through  the  transfer  vector  MONITOR.  The 
CONTROL  routine  handles  the  flow  of  control  within  the  total  assembly 
framework  and  READP  interprets  and  prints  the  user's  parameters,  speci¬ 
fying  the  type  of  updating,  type  of  printout,  etc.  CARDRD  reads  the  input 
cards  obtained  from  the  lexicographer,  which  consist  of  entries  for  new 
words  with  their  syntactic  alternatives,  entries  to  be  deleted,  and  entries  to 
be  changed.  ASSLY,  an  'assembly*  routine  in  the  strict  sense  of  the  word, 
collects  these  input  parts  into  pre -dictionary  records  and  converts  Cyrillic 
characters  into  machine-coded  sorts ble  bytes.  The  task  of  the  Sort/Merge 
system  is  to  sort  these  records  alphabetically  on  the  words  and  direct  them 
to  the  updating  routine  UPDATE,  which  --  according  to  the  user's  parame¬ 
ters  --  adds  deletes,  and/or  changes  the  dictionary  entries,  thereby  cre¬ 
ating  an  updated  version  of  the  current  dictionary.  Also,  according  to  op¬ 
tions,  the  updated  dictionary  and  the  added,  deleted,  and  changed  records 
may  be  edited  and  printed  out  by  DICPNT  for  human  investigation.  The 
eighth  module  INTER,  serves  as  a  common  boundary  between  the  Sort/ 

Merge  system,  the  input/output  tape  control  routines,  and  the  remaining 
seven  modules. 

A  dictionary  entry  consists  of  several  logical  machine  records,  the 
first  of  which  contains  the  word  and  some  bookkeeping  information,  such  as 
the  date  of  acquisition,  name  of  lexicographer,  etc.  The  succeeding  logical 
records  contain  the  syntactic  alternatives  of  the  word. 

For  the  Assembly/Update  package  a  Sort/Merge  system  (Grov-;  i  ?6?) 
is  utilised  which  avoids  the  need  to  write  the  input  data  set  (update  records) 
on  a  tape  before  it  can  be  sorted.  In  addition,  substantial  parts  of  the  sort¬ 
ing  operation  are  overlapped  with  operations  of  the  Assembly /Update  package 
la  which  the  Sort/Merge  system  is  embedded  (Figure  1-3). 

The  update  records  to  be  sorted  are  assembled  and  transferred  by 
ASSLY  to  the  initial  sort  phase  without  every  being  placed  on  a  physically 
distinct  update-record  file.  This  initial  phase  of  the  Sort/Merge  system  is 
actually  time-shared  with  ASSLY.  The  final  merge  phase  also  does  not 
create  a  physically  distinct  update -re  cord  file.  Instead,  the  alphabetically 
sorted  records  are  called,  one  by  one,  by  the  UPDATE  module.  The  latter 
matches  them  against  the  old  dictionary  and  modifies  it  accordingly,  thereby 
creating  an  updated  version  of  the  dictionary  file.  It  is  important  to  note 
that  the  desired  sequence  of  the  sort  is  specified  by  s  comparison  routine 
SORT  belonging  to  the  Assembly/Update  package  rather  than  to  the  Sort/ 
Merge  system.  This  comparison  routine  is  celled  by  the  Sort/Merge  system 
whenever  two  update  records  are  to  be  compared. 
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1.  3.  3  Dictionary  Lookup  Program 

This  program  performs  dictionary  lookup  by  comparing  Russian  text 
words  with  a  Russian  dictionary.  The  program  reads  input  sentences  from 
tape  and  creates  logical  records  of  one  text  word  each.  These  records  are 
alphabetically  sorted  ano  serially  looked  up  in  a  tape  dictionary.  The  output 
of  the  dictionary  lookup  is  then  resorted  to  conform  wit^i  the  word  order  of 
the  original  input  sentences.  The  resorted  looked-up  text  is  now  ready  to 
serve  as  input  to  th»  pai  ring  process. 

The  Sort^Merge  system  {Grove,  19671  employed  by  the  Assembly/ 
Update  package  is  also  used  by  the  dictionary  lookup  program.  The  result 
is  that  substantial  parts  of  the  Sort/Merge  operations  are  overlapped  with 
matching  operations  of  the  dictionary  lookup  program.  Since  the  Russian 
word  records  must  be  sorted  alphabetically,  matched  against  the  dictionary, 
and  then  resorted  into  their  original  text  order,  the  last  phase  of  the  first 
cort,  the  dictionary  lookup  proper,  and  the  first  phase  of  resorting  into  text 
order  all  occur  concurrently  (Figure  1-4). 

1.  3.  4  Miscellaneous  s 

Random  number  generator 

This  program  is  designed  to  produce  a  series  of  unique  random  num¬ 
bers  according  to  specified  limitations. 

A  collection  of  n  random  numbers  is  generated  where  n  is  defined 
by  the  uoer's  oararr.eter  AMOUNT.  These  random  numbers  may  range  from 
1  to  a  maximum  number  m  <  32000  which  must  also  be  defined  by  a  user's 
par*me>r  RANGE.  If  the  quantity  of  random  numbers  and/or  the  range  is 
to  changed,  the  corresponding  parameters  must  be  modified  accordingly. 

After  the  generation  of  each  random  number,  the  number  is  checked 
agaipst  previous  results  to  determine  whether  or  not  it  is  identical  to  a  num¬ 
ber  that  has  already  been  created.  If  it  is,  &  new  random  number  is  gener¬ 
ated,  thereby  assuring  uniqueness  of  the  numbers.  In  our  applications,  the 
random  number  generator  is  inserted  as  a  subroutine  in  a  Russian  sentence 
selector  program  which  serves  to  extract  a  sample  of  n  Russian  sentences 
from  a  population  of  m  sentences.  The  sample  sentences  can  then  be  pro¬ 
cessed  by  the  Combinatorial  Syntactic  Analysis  system. 

Code  expansion  program 

In  order  to  facilitate  coding  of  text  words  which  are  associated  with 
long  strings  of  tags,  a  compacted  code  may  be  assigned  to  the  word  instead 
of  the  detailed  part-of -speech  and  tag  string.  A  program  then  converts  these 
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w shorthand"  cod**  into  the  expanded  mnemonic  syntactic  alternative  codes 
(constituent  name  and  tag  string)  accepted  by  the  Combinatorial  Syntactic 
Analysis  program.  This  it  accomplished  by  matching  the  compacted  codes 
against  a  dictionary  which  contains  these  codes  as  match  arguments  and  ex¬ 
panded  mnemonic  alternatives  as  output  functions. 

Russian  word  record  generator 

This  program  accepts  keypunched  Russian  text  as  input  and  generate 
serialized  sentence  records  which  serve  as  input  either  to  the  dictionary 
lookup  program  of  the  CSA  system  or  to  that  of  the  predictive  Russian  Syn¬ 
tactic  Analyzer  (Section III).  (As  indicated  in  Figure  I-i,  the  user  has  the 
option  of  employing  the  sentence  selector  to  extract  a  random  sample  from 
the  set  of  sentence  .records  prior  to  the  dictionary  lookup  phase.) 

Dictionary  feedback  program 


This  program  serves  to  inform  the  linguist  about  the  status  of  the 
CSA  dictionary  at  any  given  time.  By  sorting  the  dictionary  file  on  certain 
control  fields,  it  can  (1)  produce  a  list  of  all  unique  syntactic  alternator  ' 
in  the  file,  thereby  displaying  all  existing  combinations  of  a  part-of-p  oeech 
code  with  a  tag  string,  and  (2)  rt.rieve  all  alternatives  having  any  cjmbina- 
tion  of  part  of  speech,  tags,  or  tag  components  specified  by  the  linguist. 
Whatever  options  are  selected,  the  program  always  writes  out  counts  of  ail 
master  reco.  Js,  ail  alternative  records,  and  all  uniquely  patterned  alterna¬ 
tives  in  the  dictionary 
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TAPEIO 


ROUTINES  FOR  DECK  NAMED 
System  entry  points:  TAFOPN,  TAPGET,  TAPFUT 


Deck: 

Routine: 

Type: 


Celling  sequence: 

Function: 

Restrictions: 


Operation: 


TAPEIO 

TAPPUT 

CSA  -independent 

I/O  type  dependent 

Entry  point 

TSX  TAPPUT,  4 

PZE  WHAT, ,  HOWMUCH 

Puts  o»t  a  physical  record  of  length  HOWMUCH 
starting  from  location  WHAT  onto  the  output  tape. 
HOWMUCH  <  internal  buffer  size  {now  22).  Most 
tape  systems  require  output  record  to  be  at  least 
three  words  long  so  as  not  to  be  recognized  as  a 
"noise  record". 

Calls  CHECK  to  complete  last  I/O.  Then  copies 
user  buffer  into  internal  buffer.  Calls  EMPTY  to 
start  emptying  internal  buffer. 


Deck: 
Routine : 
Type: 


Calling  sequence: 
Function: 


TAPEIO 

FILL 

CSA  -  independent. 

I/O  type  dependent 
Internal  to  TAPEIO 
TSX  FILL,  4 

Starts  filling  the  internal  read  buffer. 


Deck: 
Routine : 
Type: 


Calling  sequence: 
Function: 


TAPEIO 

EMPTY 

CSA  -  independent 

I/O  type  dependent  (tape) 

Internal  to  TAPEIO 
TSX  EMPTY,  4 

Starts  emptying  the  internal  write  buiier. 


Deck: 
Routine : 
Type: 


Calling  sequence: 
Function: 


TAPEIO 

CHECK 

CSA  -  indepe  ndent 

I/O  type  dependent  (tape) 

Internal  to  TAPEIO 
TSX  CHECK,  4 

Makes  sure  that  last  I/O  has  finished.  It  then  checks 
for  end-of-file,  redundancy,  end-of-tape  conditions. 
If  redundancy  has  occurred,  retries  up  to  20  times. 
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Dttck: 

Routine: 

Type: 


Calling  sequence: 


BUFWD 

DEVAD 

Function: 


Deck: 

Routine: 

Type: 

Calling  sequence: 


Function: 

Restrictions: 


If  end-of-tape,  rewinds  and  unloads  tape.  Sets 
switches  in  either  read  or  write  buffer  {whichever 
was  used  last)  to  indicate  either  normal  completion 
or  one  of  the  above  unusual  ends.  Then  it  exits. 

TAPEIO 

TAPOPN 

CSA  -independent 

I/O  type  dependent  (tape) 

Entry  point 
CAL  BUFWD 
LDQ  DEVAD 
TSX  TAPOPN,  4 

P2E  BUFFER, ,  LENGTH 
PZE  , ,  DEV 

(AC  sign  +  to  open  input,  -  to  open  output) 

Must  be  called  to  initialize  TAPGET  and  TAPPUT. 
Defines  an  internal  input  (or  output)  buffer  specified 
by  BUFWD,  and  specifies  DEV  as  the  address  of  the 
tape  device  used  for  input  (or  output).  (DEV  is  the 
octal  tape  address).  For  input,  the  BUFFER  must 
be  three  words  longer  than  actual  input  buffer  size 
desired  (to  accommodate  switches).  For  output, 
BUFFER  must  be  one  word  longer  than  desired. 
Hence,  for  an  input  buffer  of  14,  and  output  of  22, 
buffer  sizes  of  17  and  23  must  be  given,  respectively. 

TAPEIO 

TAPGET 

CSA -independent 

I/O  type  dependent  (tape) 

Entry  point 

TSX  TAPGET,  4 

PZE  USERBF,  ,  LENGTH 

.  . .  end-of-file  return. .  . 

.  .  .  redundancy  return.  .  . 

, . .  normal  return.  . . 

Reads  one  physical  record  into  user's  buffer.  (Used 
for  line -input. ) 

Specified  length  <  internal  buffer  length  (now  14). 
Calls  CHECK  to  complete  last  I/O,  if  any.  Checks 
switches  to  see  if  last  read  was  end-of-file  or  re¬ 
dundant  --if  so,  exits  appropriately.  (If  there  was 
only  1  redundancy,  it  retries  until  either  a  normal 
return  can  be  made  or  it  gets  20  redundancies  in  a 


28 


row,  in  which  case  a  redundancy  exit  is  made.)  On 
completed  operation,  copies  internal  buffer  with 
next  record. 


ROUTINES  FOR  DECK  NAMED  FILE 

System  entry  points:  PUT  CARDRD  SET  SETS  BCDCAN  DEVCAN 

BUFFER 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Operation: 


Deck: 

Routine : 

Type: 

Calling  sequence: 


Function: 

Operation: 


Deck: 

Command: 

Type: 


FILE 

PUT 

Configuration-independent 
Entry  point 
TSX  PUT,  4 
PZE  WHAT, ,  HOWMUCH 

Puts  out  a  record  of  length  HOWMUCH  starting  from 
location  WHAT  onto  the  output  device.  Which  device 
it  is  depends  upon  previous  calls  Into  deck  FILE  de¬ 
fining  the  output  device. 

Routine  SETS  has  set  PUT  as  a  transfer  to  the  appro¬ 
priate  already-initialized  device  "PUT"  routine  (e.g., 
TAPPUT,  DSKPUT,  etc.  ).  Control  goes  directly  to 
that  routine. 

FILE 

CARDRD 

Configuration-independent 

Entry  point 

TSX  CARDRD,  4 

PZE  USERBF, ,  LENGTH 

.  . .  end-of-file  return. .  . 

. .  .  redundancy  return.  . . 

. . .  normal  return.  . . 

Reads  one  card  image  record  into  user's  buffer  from 
input  device. 

SETS  has  set  CARDRD  to  be  a  transfer  to  the  appro¬ 
priate  device  GET  routine,  which  has  the  same  call¬ 
ing  sequence  (e.g.,  TAPGET). 

FILE 

SET 

Configuration-independent 
File  system  routine 
Command  entry  point 
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Command  card  format: 
Function: 


Deck: 

Routine: 

Type: 

Calling  sequence: 


Function: 


Deck; 

Routine : 

Type: 

Calling  sequence: 
Function: 


Operation: 


Deck: 
Routine : 
Type: 


$  SET  file -name  TO  da  vice -name 
File-name  is  a  name  of  the  user's  choosing.  Device¬ 
name  must  be  one  specified  by  device  table  (deck: 
DEVICE).  The  device  associated  with  the  device¬ 
name  is  now  associated  with  the  user's  file-name,  in 
that  calling  BCDCAN  with  this  file -name  in  the  future 
will  obtain  the  "canonical"  device  File 

names  INPUT  and  OUTPUT  are  treated  specially,  for 
they  refer  to  the  system  INPUT  and  OUTPUT  files, 
respectively.  When  these  file-names  are  set,  the 
device  descriptions  of  the  corresponding  device  units 
are  obtained,  the  routine  CARDRD  or  PUT  is  set  to 
point  to  the  appropriate  device  routine,  and  the  de¬ 
vice's  "open"  entry  is  called.  Hence,  $  SET  INPUT 
TO  A6  would  set  CARDRD  to  the  TAPGET  routine, 
and  would  call  TAPOPN  with  the  appropriate  param¬ 
eters  from  the  device  description  of  A6. 

FILE 

SETS 

Configuration-independent 
Entry  point 
CAL  file -name 

LDQ  device -name 
TSX  SETS,  4 

Simulates  the  execution  of  the  command 
$  SET  file-name  TO  device-name. 

FILE 

BCDCAN 

Configuration -independent 
Entry  point 
CAL  file -name 
TSX  BCDCAN,  4 

Returns  in  the  AC  the  15-bit  "canonical  address"  of 
the  device  possessing  the  given  file-name.  A  file¬ 
name  is  given  a  device  by  means  of  a  SET  card  or 
call  to  SETS. 

Simply  looks  up  the  file -name  in  a  table  called 
CANTAB  in  the  deck  DEVICE,  ai.d  returns  the  ad¬ 
dress  of  the  corresponding  device. 

FILE 

DEVCAN 

Configuration-independent 
Entry  point 
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Deck: 

Routine: 


Type: 

Calling  aeqaence; 
Function: 


Operation: 


Restrictions: 


DECK  NAMED 

System  entry  points; 

Deck: 

Table; 

Format: 


CAL  device -namte 
TSX  DEVCAN,  4 

Retar  ns  in  the  AC  die  15-bit  "canonical  address"  for 
the  device  whose  printname  in  BCD  appears  in  the  AC. 
Simply  looks  it  up  in  DEVTAB,  which  the  user  must 
have  assembled  with  the  system. 

FILE 

BUFFER 

Configuration-Independent 
Entry  point 

CAL  LENGTH, ,  NAME 
TSX  BUFFER,  4 

Returns  in  the  AC  the  buffer  control  word  associated 
with  the  15-bit  NAME  specified  in  the  calling  se¬ 
quence.  A  buffer  control  word  is  of  the  form:  PZE 
origin, ,  length.  If  there  has  not  yet  been  a  buffer 
associated  with  this  name,  a  new  buffer  is  fetched 
from  available  space  (getting  available  core  space 
is  a  system  configuration  dependent  process),  and 
ite  control  word  is  returned  in  the  accumulator. 

Scans  a  table,  BUFTAB,  which  associates  NAMEs 
with  buffer  origins  (the  lengths  are  the  same  as  the 
LENGTH  specified  by  calling  sequence).  If  a  buffer 
with  the  requested  NAME  is  found,  its  address  and 
user's  LENGTH  are  returned.  If  not,  the  next  free 
space  in  available  buffer  storage  is  obtained,  an  en¬ 
try  is  created  in  BUFTAB,  and  the  appropriate 
buffer  control  word  is  returned. 

At  present,  buffers  are  obtained  from  a  pool  of  maxi¬ 
mum  sixe  120  —  enough  for  3  input  and  3  output  line 
buffers.  FILE  system  uses  addresses  of  device¬ 
table  entries  as  "names"  of  buffers. 


DEVICE 

DEVTAB  CANTAB 

DEVICE 

DEVTAB 

2-word  entries  terminated  by  fence  of  PZE 
BCI  1,  device-name 

PZE  "canonical  address" 

The  "canonical  address"  Is  a  pointer  to  a  device  de¬ 
scription.  For  input  Une  files,  the  "device  descrip¬ 
tion"  ie  a  two-word  entry: 
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PZI  ooen  routine, ,  buffer  length  needed 
PZE  gftt/put  routine. ,  physical  address 

Deck:  DEVICE 

Table:  CANTAB 

Format:  CANTAB  is  set  by  calls  to  SETS  (other  than  with 

file -name  INPUT  or  OUTPUT),  and  tested  by 
BCDCAN.  Entries  are  stored  by  SETS  in  the  form; 
BCl  1,  file -name 

PZE  "canonical  address" 

terminated  by  fence  of  series. 


ROUTINE  FOR  DECK  NAMED  SNTAPE 


System  entry  points:  SNTAP 


Deck: 

Routine: 

Type: 

Calling  sequence: 


Function: 


Restrictions: 


SNTAPE 

SNTAP 

7094  tape  I/O  dependent 
Entry  point 
TSX  SNTAP,  4 
PZE  USERBF, ,  RETURN 
. . .  end-of-file  return.  . . 

.  .  .  redundancy  return.  .  . 

. .  .  normal  return.  .  . 

Reads  one  logical  record  from  tape  A5  into  user's 
buffer.  Tape  A5  contains  blocked  records  of  maxi¬ 
mum  physical  length  500,  each  of  whose  logical 
records  is  prefixed  by  a  logical  control  word  con¬ 
taining  PZE  relative  address  of  the  control  word  of 
the  next  logical  record,  ,  number  words  of  current 
logical  record. 

USERBF  must  be  large  enough  to  accommodate  the 
iargest-sised  logical  record  which  has  been  put  onto 
tape  A5. 


ROUTINES  FOR  DECK  NAMED  PRINT 

System  entry  points:  CHAROU  FLUSH  SETPUT  EDIT  EDITX 

INTPU  *  VARPUT  ALFPUT  BLNPUT  SETOVF 


Deck:  PRINT 

Routine:  CHAROU 
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Tn>*: 

Calling  Nfwace: 
fttutton: 


Deck: 

Routine: 

Type: 

Calling  sequence : 
Function: 


Deck: 

Routine: 

Type: 

Calling  sequence: 

(at  present) 

POINT 

Function: 


Configuration-independent 
Entry  point 
TSX  CHAROU,  4 
PZE  FROM, ,  BXTCT 

Moves  BITCT  number  of  bits  from  FROM  address 
Into  print  line  buffer  (which  must  have  been  provided 
by  SETPUT).  If  this  causes  an  overflow  past  the 
"bell"  on  the  line,  only  those  bits  up  to  the  bell  po¬ 
sition  are  inserted,  the  overflow  routine  (set  by 
SETOVF,  or  a  standard  overflow  routine  by  default) 
is  calls'*,  and  the  remaining  bits  are  then  inserted 
(the  overflow  routine  has  presumably  made  this  pos¬ 
sible).  The  current  setting  of  the  bell  is  after  the 
120th  character  position.  CHAROU  does  not  know 
this.  The  bell  position  is  defined  with  the  print 
buffer  by  SETPUT. 

PRINT 

FLUSH 

Configuration-independent 
Entry  point 
TSX  FLUSH,  4 

The  print  line  which  had  been  set  up  by  CHAROU  is 
now  put  out  on  the  output  device  by  calling  PUT.  Be¬ 
fore  this  can  be  done,  the  last  word  must  be  padded 
on  the  right  with  blanks,  and  if  there  are  fewer  than 
three  words,  blank  words  uiu.'t  be  inserted,  so  that 
the  record  put  out  by  PUT  is  three  words  or  longer 
(lest  it  look  like  a  "noise  record").  Then  the 
CHAROU  routine  is  reinitialised  to  start  sorting  char 
acters  into  bit  position  1  again. 

PRINT 

SETPUT 

Configuration  -  independe  nt 
Entry  point 
CAL  POINT 
TSX  SETPUT,  4 
PZE  BUFBL 

Initialises  the  routine  CHAROU  by  providing  a  buffer. 
To  initialise  a  buffer,  of  length  22,  with  "bell"  after 
20,  BUFBL  should  be: 

BSS  5  (words  used  by  CHAROU) 

PZE  address  ovsrflow  routine 
PZE  Line*  20, ,  20 
PZE  Line*  22,  ,2 
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PZE  10,  t  36  (decrement  always  36) 

PZE  Line*  20, ,  20  (always  identical  to  word  6) 

Dock: 

PRINT 

Routines: 

EDIT,  EDITX 

Types 

Configuration-independent 

Entry  point 

Calling  sequence: 

TSX  EDIT  or  EDITX,  4 

(see  R.C.C.  manual  for  commands) 

Function: 

Processes  the  commands  defined  for  EDIT  as  de¬ 
fined  in  the  IBM  Research  Center  Computing  Manual 
--the  restricted  class  cf  permitted  commands  in¬ 
cludes  3C.,  BK.t  IN.,  and  OC..  The  differences  be¬ 
tween  the  function  of  this  routine  and  the  function  of 
the  IBM  EDIT  are:  Output  is  sent,  via  PUT,  to  what¬ 
ever  device  the  I/O  system  is  using  for  output.  The 
I/O  system  also  checks  for  end  of  cutput  tape.  Inte¬ 
ger  specification  prints  unsigned  numbers  --no 
space  is  needed  for  a  sign.  The  OC  specification  is 
not  used  for  octal  --  it  is  used  for  a  new  kind  of  in¬ 
teger  specification  whereby  the  length  need  not  be 
defined  by  the  user  --  the  number  of  significant 
digits  is  used  as  the  length. 

Operation: 

All  EDIT  commands  end  with  either  W  (write  print 
line  ob  output  device),  E  (terminate  calling  sequence), 
N  (neither),  or  B  (both).  In  addition,  the  command 
specifies  the  appropriate  format  conversion.  Th*» 
EDIT  subroutine  in  this  system  takes  each  individual 
edit  command,  determines  the  effective  address 
[iron  actual  address  and  index  register  if  specified), 
and  then  calls  either  BLNFUT,  ALFPUT,  VARPUT, 
or  INTPUT,  depending  on  the  command.  If  the  com¬ 
mand  was  for  write  print  line.  FLUSH  is  called  after 
executing  the  command.  If  the  command  specified  E 
foi  end  calling  sequence,  the  routine  returns  to  the 
caller. 

Deck: 

PRINT 

Routine: 

INTPUT 

Type: 

Configuration- independent 

Entry  point 

Calling  sequence: 

TSX  INTPUT,  4 

PZE  NUMBER, ,  LENGTH 

where  NUMBER  is  the  address  of  an  integer  less  than 
i, 000, 000  and  LENGTH  is  the  number  of  print  posi¬ 
tions  to  be  used. 

3< 


Function: 


Operation: 


Deck: 

Routine: 

Tyne: 

Calling  sequence: 
Function: 


Operation: 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Operation: 

Deck: 

Routine: 

Type: 

Calling  sequence; 

Function ; 
Operation: 


Inserts  the  specified  integer  into  the  print  line, 
using  LENGTH  number  of  print  positions.  The  inte¬ 
ger  is  padded  on  the  left  with  blanks,  if  necessary. 
Calls  B1NBCD  to  convert  integer.  Then  pads  with 
blanks.  Then  calls  GHAROU  to  put  out  converted 
integer,  making  sure  that  the  bit  count  is  6  times 
the  length  count  of  characters. 

PRINT 
VAR  PUT 

Configuration-  independent 
Entry  point 
TSX  VARPUT,  4 
PZE  NUMBER 

NUMBER  is  the  address  of  an  integer  less  than 
1,000,000.  The  significant  digits  of  this  number  are 
put  into  the  print  line.  No  additional  blanks  are  in¬ 
serted. 

B1NBCD  is  called  to  convert  the  integer.  The  XR2 
will  tell  how  many  significant  digits  there  were,  and 
CHAROU  is  then  called  to  put  out  6  times  that  num¬ 
ber  of  bits. 

PRINT 

ALFPUT 

Configuration-independent 
Entry  point 
TSX  ALFPUT,  4 
PZE  WHERE,  .  COUNT 

A  COUNT  number  of  characters  is  entered  into  the 
pr’.it  line  from  the  location  WHERE  (and  beyond,  if 
more  than  6  characters  are  entered). 

CHAROU  is  simply  called  with  a  bit  count  of  6  times 
COUNT. 

PRINT 
BLN PUT 

Configuration-independent 
Entry  point 
TSX  BLNPUT,  4 
PZE  ,  ,  NUMBER 

Puts  NUMBER  blanks  into  a  print  line. 

Puts  6  times  NUMBER  fc*t*  of  blanks  mto  print  line 
with  CHAROU. 


Deck: 

Routine: 

Type: 


PRINT 

SETOVF 

Configuration- indepc  odent 
Entry  point 


Calling  sequence;  TSX  SETOYF.  4 

PZE  RTN  {zero  for  refetune  "default" 
overflow  routine) 

Function:  The  next  time  a  call  to  CHAROU  causes  bits  to  be 

entered  past  the  "bell"  on  a  line,  the  routine  RTN 
is  TSXed  to.  V  no  overflow  routine  is  sp.  cified, 
the  program  will  TSX  STAND.  4  and  the  rout’ne 
STAND  will  simply  call  FLUSH  tc  terminal  the 
current  line,  and  call  CHAROU  with  a  single  blank 
to  insert  into  the  carriage  control  position  oi  the 
next  line. 


ROUTINE  FOR  DECK  NAMED  BINBCD 


System  entry  point:  BINBCD 


Deck: 

Routine: 

Type; 

Calling  sequence: 


Function. 


BINBCD 

BINBCD 

Configuration -independent 
Entry  point 
CLA  NUMBER 
TSX  FENBCD 

The  AC  contains  an  unsigned  number  less  than 
1,000,000.  The  result  is  6  BCD  digits  in  the  AC  in 
printable  form.  Index  register  Z  will  contain  the 
number  of  significant  digits  of  the  result.  Non¬ 
significant  zeroes  are  not  blanked  out  --  the  caller 
may  blank  them  out  if  he  wishes,  since  index  regis¬ 
ter  Z  already  contains  the  number  of  significant 
digits. 


ROUTINES  FOR  DECK  NAMED  MONTT 

System  entry  points:  RETURN  READIN  STACK  CCMMA  *  RN'EXT 

CALL  DATSAV  DSTACK 

General  description:  READIN  reads  a  card  {u»in*  SENTRD)  md  goes  to 

routine  specified  by  first  word  after  *  of  that  card. 

$  must  be  in  column  i,  or  the  c arc  will  be  se  pned. 
STACK  holds  ..he  arguments  on  the  <ard.  dQMMAN 


Deck: 

Routine: 

Type; 


PRINT 
SETOVF 

Configure  tion- independent 
Entry  point 
Calling  sequence:  TSX  SETOVF,  4 

PZE  RTN  (zero  for  resume  "default" 
overflow  routine) 

Function:  The  next  time  a  call  to  CHAROU  causes  bits  to  be 

entered  paot  the  "bell"  on  a  line,  the  routine  RTN 
is  TSXed  to.  If  no  overflow  routine  is  specified, 
the  program  will  TSX  STAND,  4  and  the  routine 
STAND  will  simply  call  FLUSH  to  termina'-e  me 
current  line,  and  call  CHAROU  with  a  single  blank 
to  insert  into  the  carriage  control  position  of  the 
next  line. 


ROUTINE  FOR  DECK  NAMED  BINBCD 

System  en-ry  point:  BINBCD 

Deck;  BINBC^ 

Routine:  BINBCD 

Type;  Configuration- independent 

Entry  point 

Calling  sequence:  CLA  NUMBER 

TSX  BINBCD 

Function:  The  AC  contains  an  unsigned  number  less  than 

'I'l.OOG,  The  result  is  6  BCD  digits  in  the  AC  in 
printable  form.  Index  register  2  will  contain  the 
number  of  sig,  INcant  digits  of  the  result.  Non¬ 
significant  zeroes  are  not  bl  nked  out  --  the  caller 
may  blank  them  cut  if  ne  wishes,  since  index  regis¬ 
ter  2  already  contains  the  number  oi  significant 
digits . 

ROUTINES  FOR  DECK  NAMED  MONIT 

System  entry  points:  RETURN  READIN  STACK  COMMAN  RNEXT 

CALL  DATSAV  DSTACK 

Geue-.al  description:  READIN  readc  a  card  (using  SENTRD)  and  go" '  to 

routine  specified  by  first  word  after  $  of  that  card. 

$  must  be  in  column  1,  or  the  card  will  be  skipped. 
STACK  holds  the  arguments  on  the  card,  COMMAN 


Deck: 
Routine : 
Type: 


Calling  sequence: 
Function: 


Restriction: 

Deck: 

Routine: 

Type: 


Calling  sequence: 
Function: 


Operation; 


transfers  to  routine  specified  by  $  card  in  STACK 
assumed  to  have  already  been  read  in.  RNEXT 
reads  next  card  and  goes  to  routine  specified  by  $ 
card.  RETURN  goes  back  to  routine  which  last 
called  READIN.  CALL  first  copies  user's  stack 
into  STACK,  and  from  there  on  acts  like  COMMAN. 
Needs  table  COMTAB  containing  pairs  of  entries 
BCI  1,  NAME  PZE  entry  point.  This  is  in  deck 
COMMAN.  DATSAV  moves  the  date  information 
from  the  $DATE  card,  which  was  temporarily  copied 
inho  STACK,  into  DSTACK  for  any  date  information 
printings. 

MONI.T 

CALL 

Configuration-independent 

Monitor 

Entry  point 

TSX  CALL,  4 

PZE  WHENCE 

Copies  the  user's  stack,  starting  from  WHENCE, 
into  the  monitor  STACK.  Then  prints  STACK  via 
PRBCD.  Then  goes  to  COMMAN  to  execute  the 
command. 

The  monitor  STACK  ie  currently  only  of  length  24. 

MOHIT 

READIN 

Configuration- independent 

Monitor 

Entry  point 

Needed  for  CSA  monitor  operations 
TSX  READIN,  4 

Initial  entry  to  monitor.  (OFENIO  must  have  been 
called  first.)  Skips  to  a  $  card,  then  (at  COMMAN) 
transfers  control  to  routine  specified  by  $  card. 

The  card  is  saved  in  .'"T'ACK  for  reference  by  the 
routine.  Control  is  returned  to  user  when  some 
program  calls  TRA  RETURN. 

Calls  SENTR  D  to  read  into  STACK.  When  word  at 
STACK  is  $.  takes  STACK  -t  1  and  scans  down 
COMTAB  for  matching  BCD  word,  which  must  be 
found,  or  else  an  error  message  is  given  and  the  job 
is  killed,  When  the  word  is  found,  the  corresponding 
entry  point  is  transferred  to.  The  Index  register  4 
is  saved  so  that  RETURN  can  give  control  back  to  the 


S9KW 


caller 


Deck: 

Routine; 

Type; 


Calling  sequence; 
Function: 


DECK  NAMED 
System  entry  point: 
Deck: 

Table  name; 

Type: 


Format: 


Function: 


Operation: 


MONIT 

DATSAV 

Configuration -independent 

Monitor 

Entry  point 

None 

When  a  $DATE  card  is  read,  the  monitor  transfers 
via  the  COM  TAB  entry  BCI  1,  DATE 

PZE  DATSAV 

to  this  routine  which  moves  the  date  information 
from  the  temporary  STACK  into  the  permanent 
DSTACK  from  which  it  can  be  printed  anytime  by 
the  DATE  routine. 


COMMAN 

COMTAB 

COMMAN 

COMTAB 

Used  by  MONIT  --  must  be  present  on  all  jobs  with 

monitor 

Configuration-independent 

Table 

A  list  of  two-word  entries,  terminated  by  a  fence 
of  zeroes.  Each  entry  is  of  the  form: 

BCI  1,  NAME 

PZE  TRANSFER  POINT 

To  use  the  monitor  system,  supply  a  set  of  COMTAB 
entries,  each  containing  as  NAME  the  first  6  char¬ 
acters  of  the  name  of  the  desired  command,  and  as 
transfer  point  the  address  of  the  routine  to  process 
the  appropriate  command. 

When  the  monitor  is  in  control,  $  cards  a~e  read  in¬ 
to  STACK  (an  entry  point  in  MONIT).  The  word  after 
the  dollar  sign  is  looked  up  in  COMTAB  and  control 
is  then  transferred  to  the  entry  point  corresponding 
to  the  matched  COMTAB  entry.  Other  parameters 
may  appear  on  the  $  card,  and  the  monitor  may  in¬ 
terrogate  these  parameters  by  referencing  the 
STACK.  STACK  itself  will  contain  a  $,  STACK  +  1 
the  name  of  the  command,  and  the  remaining  loca¬ 
tion*  will  contain  the  parameters. 


ROUTINES  FOR  DECK  NAMED 


READ 


System  entry  points: 

Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Operation: 


Deck: 

Routine : 

Type; 

Calling  sequence: 
Function: 


SENTRD  OPENIO 

READ 

OPENIO 

7094/configuratlon-dependent 

Entry  point 

TSX  OPENIO,  4 

This  routine  must  be  called  to  initialize  input-output 
for  the  particular  system  configuration  at  a  given 
installation.  At  the  IBM  Research  Computing  Center, 
OPENIO  interrogates  sense  switch  i.  If  it  is  on,  the 
routine  calls  SETS  to  define  INPUT  file  as  A2  device, 
OUTPUT  as  A3;  if  it  is  off,  the  routine  defines  INPUT 
file  as  DISKIN  and  OUTPUT  as  DISKOU.  These  de¬ 
vices  must  appear  in  the  DEVICE  deck's  table,  if  the 
appropriate  sense  switch  setting  is  used.  Since  at 
the  moment  only  A2  and  A3  (and  other  tapes)  appear 
in  DEVICE  table,  sense  switch  1  can  only  be  on. 

Checks  the  sense  switch  settings,  and  calls  SETS  in 
FILE  deck  with  appropriate  device  names.  Calls 
SETPUT  in  PRINT  to  provide  a  print  line  buffer  for 
use  by  EDIT  (and  any  routines  calling  CHAROU). 

READ 

SENTRD 

Configuration -independent 
Entry  point 
TSX  SENTRD,  4 

PZE  STACK, ,  SIZE 

The  user  can  specify  a  STACK  of  any  size  he  wishes. 
SENTRD  will  read  a  "logical  card".  A  logical  card 
runs  from  column  1  through  column  72,  unless  col¬ 
umn  72  is  punched  with  an  11 -punch  (minus  sign),  in 
which  case  it  includes  columns  1-71  plus  continuation 
from  the  next  physical  card.  Any  number  of  physical 
cards  can  form  a  logical  card  as  long  as  a  in 
column  72  indicates  that  the  next  card  is  a  continua¬ 
tion,  SENTRD  looks  for  strings  of  consecutive  alpha¬ 
betic  characters,  separated  by  blanks  or  by  "break 
characters"  (comma,  dollar  sign,  pericd,  parenthe¬ 
ses,  slash,  equal  sign).  Each  string  of  alphabetic 
characters  between  hlanks  or  breaks,  as  well  as  each 
individual  break  character,  is  stored,  left  justified, 
padded  with  blanks,  in  the  next  free  location  in  STACK. 
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If  a  string  is  longer  than  6  characters,  the  compo¬ 
nents  of  the  string  are  separated  by  a  "logical  con- 
catenator"  symbol  consisting  of  the  word 
767676767676.  Blanks  from  cards,  which  are  used 
only  to  delimit  fields  of  text  characters,  are  not  put 
into  the  STACK.  After  the  last  word  has  been  stored 
in  STACK,  a  fence  of  777777777777  is  stored.  SIZE 
is  the  maximum  number  of  words  (including  fence) 
which  will  be  stored  in  the  STACK.  Overflow  will 
be  lost,  and  the  fence  stored  in  the  last  location,  if 
the  logical  card  has  too  many  fields. 

Example:  Suppose  the  logical  card  contains: 

ABC  DEF  K/L  C,  D=EFGHIJK 
The  stack  will  contain: 

ABC 

DEF 

K 

/ 

L 

C 

9 

D 

EFGHIJ 

767676767676 

K 

777777777777 


ROUTINES  FOR  DECK  NAMED  HASHTG 

System  entry  points:  HASHTG  PUTTAG  CLEAR  CANCEL 

General  description:  These  routines  are  used  to  develop  tags  created 

either  as  a  result  of  reading  the  codings  of  the  input 
items  or  as  a  result  of  creation  by  the  GETRL  pro¬ 
gram  for  creating  new  constituents.  In  either  case, 
the  tags  are  generated  linearly,  starting  with  the 
fii  it  attribute,  followed  by  its  value(s),  and  so  on. 
Each  time  one  wishes  to  append  an  18-bit  tag  element 
to  the  list  of  tags  currently  being  generated,  one 
calls  PUTTAG.  When  one  has  completed  an  entire 
data  constituent  after  calling  PUTTAG  a  number  of 
times,  one  then  calls  HASHTG  and  gets  the  address 
of  this  newly  erected  "data  constituent".  To  ignore 
previous  calls  to  PUTTAG  since  the  last  call  to 
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Deck: 

Routine: 

Type: 

Calling  sequence: 


Function: 


Operation; 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Operation: 


HASHTG,  call  CANCEL.  Thereafter,  the  next  call 
to  PUTTAG  will  start  at  the  beginning  of  a  new  con¬ 
stituent.  To  clear  the  TAGS  table  stored  in 
HASHTG,  call  CLEAR. 

HASHTG 

PUTTAG 

Configuration-independent 

Entry  point 

CAL  component 

TSX  PUTTAG.  4 

where  component  is  of  the  form: 

PZE  18-bit  tag  component 

PUTTAG  appends  this  18 -bit  tag  component  to  the 
other  18-bit  tag  components  created  since  the  last 
call  to  HASHTG  or  CANCEL.  Wne^  HASHTG  is 
called,  the  string  of  18-bit  components  stored  via 
PUTTAG  will  be  stored  away  and  the  address  of  the 
origin  of  the  data  constituent  will  be  returned  by 
HASHTG. 

PUTTAG  uses  the  table  TEMPTG  (size  100)  to  store 
these  18-bit  quantities,  half-word  by  half-word.  It 
keeps  a  running  checksum  of  these  quantities,  to  be 
used  by  HASHTG  as  a  "hash  sum". 

HASHTG 

HASHTG 

Configuration-independent 

Entry  point 

TSX  HASHTG,  4 

If  PUTTAG  has  been  called  one  or  more  times  since 
the  last  call  to  HASHTG  or  CANCEL.  HASHTG  in¬ 
serts  the  stacked  up  tags  (which  have  been  put  18  bits 
at  a  time  onto  TEMPTG  by  PUTTAG)  into  the  tag 
table,  TAGS  (length  8000),  provided  that  the  same 
string  has  not  already  been  stored.  In  either  case, 
HASHTG  returns  in  the  AC  address  field  the  address 
of  the  beginning  of  this  tag  string,  AC  sign  is  "-" 
if  tag  has  previously  been  stored  in  TAGS. 

Th^  18-bit  checksum  of  the  components  which  has 
be  i  built  up  in  PUTTAG  is  now  searched  for  in  a 
hash  table  of  length  1024  named  TAGPS.  Each 
TACPS  entry  contains  the  18-bit  checksum  and  a 
pointer  to  the  entry  In  TAGS  containing  the  tag 
string.  When  a  checksum  is  generated,  an  entry  is 
looked  for  in  TAGPS  (by  hashing  technique).  If  no 
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•atry  it  found,  on*  it  inttrttd,  and  the  tag  it  moved 
into  the  next  free  tpace  in  TAGS.  If  an  entry  it 
found  in  TAGPS,  the  tag  pointed  to  by  it  it  compared 
word  for  word  with  the  tag  in  TEMPTG.  UtuaJly 
theee  will  be  the  tame,  for  it  it  a  coincidence  indeed 
if  two  different  tagt  have  the  tame  18-bit  hath  turn, 
but  the  check  it  made  anyhow.  If  the  tag  it  already 
in  the  TAGS  table,  then  the  TEMPTG  entriet  are  not 
copied.  If  the  tag  it  found  not  to  be  in  TAGS,  it  it 
moved  in,  end  a  TAGPS  entry  it  created. 

Deck:  HASHTG 

Routine:  CLEAR 

Type:  Configuration-independent 

Entry  point 

Calling  eequence:  TSX  CLEAR.  4 

Function:  Cleart  the  TAGS  table  and  the  TAGPS  table. 


Deck: 

Routine: 

Type: 

Calling  aequence: 
Function: 


Operation: 


HASHTG 

CANCEL 

Configuration-independent 

Entry  point 

TSX  CANCEL,  4 

Cauaet  the  TEMPTG  table,  which  hat  held  the  atring 
of  tagt  generated  oy  call*  to  PUTTAG  tince  the  last 
call  to  HASHTG,  to  be  reaet  --  the  next  call  to  PUT- 
TAG  will  start  over  by  storing  into  the  beginning  of 
TEMPTG,  at  if  the  latest  calls  to  PUTTAG  had 
never  been  issued. 

Resets  the  TEMPTC  L'ble,  zeroes  the  running  hash 
sum  of  18-bit  items,  turns  off  the  twitch  used  to  in¬ 
dicate  that  no  tagt  have  been  stored  yet. 


ROUTINES  FOR  DECK  NAMED  HASH 

System  entry  points:  HASH  PR  CLRTEM  HASHBC 


Deck: 

Routine: 

Type: 

Calling  sequence: 


HASH 

HASHBC 

Configuration-independent 
Entry  point 
CAL  WORD 
TSX  HASHBC,  4 
PFX  0 

(where  PFX  is  ONE,  TWO.  THREE,  or  FOUR  --  -ee 
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below) 

Function:  If  the  word  in  the  AC  ie  etored  in  e  taKe  of  BCD  con* 

stituents,  then  the  word  is  not  stored  again  --  the 
address  of  the  BCD  constituent  is  retrieved.  Thus. 
HASHBC  converts  BCD  words  to  addresses  pointing 
to  a  place  where  these  BCD  words  are  stored.  There 
are  two  tables  where  BCD  words  are  stored:  the 
first,  called  CONTAB  (length  997),  is  never  cleared; 
the  second,  called  ALTTAB  (the  alternative  table, 
length  299),  is  cleared  by  calling  CLRTEM  (this  is 
done  in  the  CSA  at  the  start  of  each  sentence).  The 
prefix  determines  which  table  is  used  and  whether 
the  value  must  be  already  present.  ONE  means 
value  must  be  in  CONTAB;  TWO  means  value  is  put 
in  CONTAB  if  not  already  there;  THREE  means  if 
value  is  not  in  CONTAB  it  must  be  in  ALTTAB; 

FOUR  means  if  not  in  CONTAB,  try  ALTTAB,  if 
not  there,  put  it  in.  AC  returns  address  where  con¬ 
stituent  is  stored,  or  -0  if  PFX  was  ONE  or  THREE 
and  symbol  was  not  found. 

Operation:  To  hash  a  BCD  word,  turn  it  into  a  relative  address 

in  a  narrower  range  than  the  totality  of  BCD  words 
by  applying  a  "hashing  function".  In  this  case,  the 
hashing  function  consists  in  taking  the  remainder 
after  dividing  the  BCD  word  by  997,  and  is  hence  a 
relative  address  in  CONTAB.  We  then  examine  this 
location  in  CONTAB  --  if  it  is  empty,  we  know  the 
symbol  was  never  stored  in  CONTAB,  for  if  it  were 
it  would  be  put  in  the  first  empty  location;  if  it  ie  not 
empty,  its  contents  are  compared  with  the  symbol. 

If  a  complete  match  occurs,  then  we  have  found  the 
symbol  and  return  the  absolute  address.  If  the 
symbols  are  different,  we  must  examine  the  next 
location  of  the  table  and  test  for  equality  or  empti¬ 
ness.  An  ideal  size  for  a  hash  table  is  roughly  twice 
the  size  of  the  number  of  items  to  be  stored  in  it. 
Under  these  conditions,  the  expected  number  of  com¬ 
parisons  needed  before  finding  the  address  for  a 
BCD  word  is  of  the  order  of  1.  ?.. 

Deck:  HASH 

Routine:  CLRTEM 

Type:  Configuration-independent 

Entry  point 

Calling  sequence:  TSX  CLRTEM,  4 

Function:  Clears  the  cable  ALTTAB  used,  by  HASHBC  to  store 


all  BCD  words  not  already  stored  in  CONTAB.  The 
CSA  calls  this  routine  at  the  start  of  the  read-in  of 
each  sentence. 

Deck:  HASH 

Routine:  HASHPR 

Type:  Configuration-independent 

Entry  point 

Calling  sequence:  CAL  PAIR 

TSX  HASHPR,  4 

PAIR  PZE  part-of-opeech  address,  ,  part-of -speech  addr. 

Function:  If  there  is  a  RULTAB  entry  for  this  pair  of  parts  of 

speech,  the  AC  sign  will  be  set  If  there  is  no 

such  entry,  the  sign  will  be  set  Index  register 

2  is  set  to  the  complement  of  the  relative  address  of 
where  the  rule  entry  is  stored,  if  it  is  stored,  and 
where  it  should  be  stored  if  it  is  not  stored:  i.e., 

CAL  RULTAB,  2  will  fetch  the  first  word  of  the 
RULTAB  entry  for  the  given  pair. 

Operation:  HASHPR  uses  a  hash  addressing  scheme  similar  to 

that  used  by  HASHBC  except  that  each  relative  ad¬ 

dress  is  a  multiple  of  4,  since  RULTAB  contains 
4-».  ->rd  entries. 


ROUTINE  FOR  DECK  NAMED  PRBCD 

Syster.  entry  point:  PRBCD 

Deck:  PRBCD 

Routine:  PRBCD 

Type:  Configuration-independent 

Entry  point 

Calling  sequence:  TSX  PRBCD,  4 

PZE  STACK 

Function;  Prints  a  STACK  read  by  SENTRD.  Inserts  blanks 

around  break  characters,  except  that  blanks  are 
suppressed  before  /,  comma,  and  period, 

ROUTINES  FOR  DECK  NAMED  PR  TAG 

System  entry  points:  PRTAG  PRWORD  REGION 


Deck: 

Routine: 


PRTAG 

PRTAG 


Type: 

Calling  sequence: 

CONST 

Function: 

Operation: 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 

ROUTINE  FOR  DECK 

System  entry  point: 

Deck: 

Routine: 

Type: 

Calltng  sequence: 

ADDRESS 

Function: 


Configuration-independent 

Entry  point 

CAL  CONST 

TSX  PR  TAG,  4 

PZE  part-of-speech, ,  tags 

Prints  a  line  containing  the  part  of  speech  followed 
by  the  data  tags  for  a  given  data  constituent.  The 
line  prints  the  data  tags  as  they  would  have  appeared 
on  an  input  card  for  those  data  tags  (except  that  the 
null  value  for  an  attribute  is  printed  blank  instead  of 
*)• 

Scans  the  string  of  i 8-bit  sections  forming  the  data 
tag,  recognizing  the  beginning  of  new  attributes  and 
values,  and  separating  attribute  from  value  by  /, 
the  values  by  commas,  and  the  separate  tags  by 
blanks.  Uses  PRWORD  to  print  the  non-blank  por¬ 
tions  of  the  symbols. 

PR  TAG 
PRWORD 

Configuration -independent 
Entry  point 
CAL  SYMBOL 
TSX  PRWORD,  4 

SYMBOL  is  a  BCD  character  string  with  trailing 
blanke.  PRWORD  puts  out  the  significant  BCD  char 
aciers  without  the  trailing  blanks  onto  the  print  line. 
(It  calls  EDIT  to  lay  out  thiB  print  line.) 


NAMED  PRRULE 

PRRULE 

PRRULE 

PRRULE 

Configuration -independent 

Entry  point 

CAL  ADDRESS 

TSX  PRRULE,  4 

PZE  INDENT 

FZE  RTCON,  .  LFTCON 

PZE  address  of  start  of  aubrule 

To  print  out  the  subrule  pointed  to  in  the  AC  whose 

part-of-speech  names  (which  do  not  appear  in  the 

tags)  are  R  i  CON  and  LFTCON.  Prints  left  half  of 
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subrule  on  current  line,  and  right  half  of  subrul*  on 
next  line  after  indenting  INDENT  number  of  apace*. 
Return*  the  original  AC  input 


ROUTINES  FOR  DECK  NAMED  DUMP 


Syatem  entry  point*:  TDUMP  DWORD  DRULE  DBIG  DPROC 

RULSW 


Deck: 

Command  entry  point : 
Type: 


Command  cardforr.\at: 


Function: 

Operation: 


DUMP 

TDUMP 

CSA  monitor -dependent 
Configuration-independent 
Command  entry 

$  DUMP  parameter-1  parameter-2  . .  .  parameter-n 
where  parameter*  are  any  of:  WORDS,  RULTAB, 
PROGRAM,  BIGTAB. 

Causes  dump  of  specified  memory  areas  used  by 
CSA  system. 

For  each  parameter,  calls  one  of  the  associated  sub 
routines  DWORD,  DRULE,  DBIG,  and  DPROG. 


Deck: 

Routine: 

Type: 

Calling  sequence; 
Function: 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


DUMP 

DE!G 

Configuration -^dependent 
Entry  point 
TSX  DBIG.  2 

Lists  the  BIGTAB  table  res>ULJng  from  syntactic  anal¬ 
ysis  of  a  sentence.  For  internal  format  of  BIGTAB, 
see  description  of  tables  in  COMMON  area  at  the  end 
of  this  manual.  Before  listing  BIGTAB,  it  prints 
sentence  number  and  date  line. 

DUMP 

DPROG 

Coi  r'  guration-independent 
Entry  point 
TSX  DPROC.  2 

Dump  all  of  core  from  location  0  to  the  entry  point  of 
a  supplied  program  called  END, 


Deck: 
insulin* ; 

Type: 


DUMP 

DRULE 

Configuration- indepen  dent 
Entry  point 
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Calling  sequence; 
Function: 

Operation: 


ROUTINES  FOR  DECK 

System  entry  points; 

Deck: 

Routine: 

Type: 

Calling  sequence: 

Function: 


Operation: 


TSX  DRULE,  L 

Lists  all  rules  in  the  rule  table  grouped  in  packets  of 
subrules.  If  ERRSW  is  on  (meaning  grammar  com¬ 
pilation  unsuccessful),  there  is  no  printout. 

Scans  through  the  hashed  RULTAfi.  For  all  non¬ 
empty  entries,  discovers  number  of  subrules  in  each 
rule,  and  address  of  first  subrule.  Since  each  sub¬ 
rule  points  to  the  next  consecutive  subrule,  can  print 
each  subrule  within  a  particular  rule  grouping  It 
prints  these  subrules  by  calling  routine  PRRULE. 

For  format  of  RULTAB,  see  description  of  tables  in 
COMMON  area  at  the  end  of  th<s  manual. 


NAMED  PR  A  NS 

PRANS  S  TATIS 

PRANS 

FRANS 

Configuration -independent 
Entry  point 
TSX  PRANS,  4 

PZE  Complement  of  first  empty  space  in  BIGTAB 
PZE  Complement  of  twice  number  of  last  word  in 
sentence 

PZE  Address  of  first  sentence  symbol,  ,  number  or 
buch 

PRANS  will  print  all  constituents  which  span  from 
the  first  to  the  last  word  of  ihe  sentence,  and  whose 
part  of  speech  is  a  user-defined  '  sentence  symbol" 
(e.g.,  "S"  or  "PRED").  PRANS  prints  these  in  tree 
form,  showing  each  node  with  all  its  tags,  the  fine 
in  BIGTAB  corresponding  to  it,  and  the  level  number 
on  the  tree.  In  addition,  for  each  tree  pxirted, 
PRANS  will  print  a  list  of  all  values  appearing  oi, 
the  tag  "R"  or  "RULE"  attribute  for  a  node.  If  op¬ 
tion  TNODE  is  specified,  all  nodes  which  were  sup¬ 
pressed  in  execution  because  they  were  identical  in 
span  and  in  tags  to  another  are  printed  out  with  their 
constituent  trees. 

When  PRANS  finds  the  top  of  a  tree  to  be  printed  out, 
it  scans  down  the  tree.  It  saves  the  not  yet  printed 
branch  address  for  level  u  in  the  nth  level  of  a  push¬ 
down  list,  PUSH1.  To  print  a  branch,  it  calls 
MODUL. 
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D  <•  c  V : 

Routine : 

Type: 

Crtiling  sequence: 


Operatioi  : 


FRANS 
MODI' L 

Configuration -independent 
Internal  to  PRANS 
'  SX  MODUL,  7 

{.ndex  register  2  is  the  complement  of  the  relative 
lddress  in  BIGTAB  of  the  node  to  be  printed.) 

Used  with  the  generator  program  to  generate  nodes. 
MOLUL  prints  the  node  at  BIGTAB  +  2,  2  (third 
word  in  *ntry,  pointing  to  part  of  speech  and  tags) 
using  PRTAG,  after  it  has  indented  a  number  of 
indentation  units  equal  to  the  level  count.  At  the 
position  where  constituents  for  level  n  are  printed 
out,  MODUL  prints  the  character  "I"  for  all  levels 
where  a  node  is  not  yet  printed  out,  giving  the  char¬ 
acter  of  a  tree  with  vertical  lines  linking  pairs  of 
nodes  on  the  same  level. 


ROUTINE  FOR  DECK  NAMED  DATE 


System  entry  point:  DATE 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


DATE 

DATE 

Configuration-independent 
Entry  point 
TSX  DATE,  4 

Prints  sentence  number  and  date  information  of  the 
latest  $DATE  card  in  one  line.  Uses  PRBCD  for 
printing  date  information  from  the  monitor  date 
stack. 


ROUTINES  FOR  DECK  NAMED  GRAMAR 


System  entry  points:  GRAMAR  ERRSW 


Deck: 

Command  entry  point: 
Type: 


Command  card  format: 
Function: 


GRAMAR 

GRAMAR 

CSA  monitor-dependent 
Configuration -independent 
Command  entry  point 
$  GRAMMAR 

Reads  rule  packets  following  $  GRAMMAR  card  until 
another  dollar  sign  card  or  end  of  file  is  reached. 
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Operation: 


Deck: 

Routine: 

T  ype ; 

Calling  sequence: 

TAG 

Function: 


Operation ; 


Deck; 

Routine : 

Type: 

Calling  sequence: 


Compiles  these  rule  pac  kets  into  entries  in  RULTAR 
(see  description  of  tables  in  COMMON  area)  pointing 
to  subrules  stored  internally  in  7 AGTAB.  Lists  any 
formation  errors  in  these  iules,  and  turns  on  the 
switch  ER.RSW  (an  entry  point)  if  there  are  any  errors 
Reads  *  card  serving  as  a  rule  header,  and  calls 
HASHPR  to  get  a  RULTAB  entry  for  a  pair  of  parts 
of  speech.  All  symbols  are  first  converted  to  15-bit 
addresses  by  calling  HASHBC  (specifying  the  "perm¬ 
anent"  hash  table).  (See  "HASHBC".)  Reads  subrule 
cards,  and  inserts  the  compiled  tag  conditions  into 
the  table  TAGTAB.  Uses  subroutines  NEXT  and  PUT 
(internal  routines)  to  fetch  constituents  from  input 
cards  and  to  More  generated  rule  tags. 

GRAMAP 
PUT,  PUTAB 
Configuration- independent 
Internal  to  GRAMAR 
CAL  TAG 
TSX  PUT,  4 

BCI  1,  SYMBOL 

PUT  converts  the  symbol  in  the  AC  to  a  15 -bit  ad¬ 
dress,  It  then  inserts  3  additional  bits:  Bit  18  if 
switch  ATTR  is  on,  bit  19  if  NEWCN  is  on,  and  bit 
20  if  switch  MINUS  is  on.  It  resetB  the  first  two  of 
these  switches  after  every  call.  The  18  -bit  quantity 
then  obtained  is  inserted  into  the  next  free  position 
in  the  TAGTAB  stored  in  GRAMAR  :o  hold  subrule 
tags.  Where  the  conversion  to  a  15-bit  address 
must  be  avoided,  i.e.,  for  symbols  1  and  2,  calls 
PUTAB  instead  of  PUT. 

PUT  first  calls  HASHBC  to  get  a  15-bit  address. 
PUTAB  s'  ips  this  step.  Then  the  additional  bits 
are  inserted,  the  switches  reset,  and  the  result 
stored  in  the  next  free  TAGTAB  location. 

CRAMAR 

NEXT 

Configuration -independent 
Internal  to  GRAMAR 
TSX  NEXT,  4 

.  .  .  exit  If  character  in  "STACK"  was  = 

.  .  .  exit  if  character  wa  s  + 

.  .  .  exit  if  character  was  ,  or  /(  ) 

.  .  .  exit  if  character  was  / 
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Function: 


.  .  .  exit  if  character  was  anything  else. 

Gets  r  ext  item  in  "STACK"  resulting  from  rea^n^  a 
subrule.,  and  transfers  to  the  appropriate  exit.  A 
"STACK"  is  defined  on  the  card  specifying  subrou¬ 
tine  SENTRD. 


Deck: 

Table: 

Type: 

Size: 

Format  of  a  single 


GRAMAR 

TAGTAB 

CSA  subrule  table 
7500 
subrule: 

MZE  symbolic  label  (if  any),  ,  address  next  subrule 
PZE  success  exit,  ,  failure  exit 
Strings  of  18  bits  as  follows: 

XYZ  address  of  attribute  cr  value 

X  =  1  tor  start  new  attribute,  0  for  value 

Y  =  1  for  start  new  constituent  (Cj,  C £,  or  c3) 

Z  =  1  for  negative  sign  (all  constants  following  X- 
on  a  tag  condition) 

Numbers  on  rules  (e.g.,  ETC/2  or  ATT/2-X)  are 
stored  as  absolute  values. 

A  fence  of  077777  terminates  a  subrule. 

If  a  constituont  is  tagless,  then  600000  is  present  to 
indicate  new  constituent  but  zero  attribute. 

/*  appears  as  half  word  000000. 


ROUTINE  FOR  DECK  NAMED  SENTNC 


System  entry  points:  SENTNC  TAPSW  COUNTS 


Deck: 

Command  entry  point: 
Tyne- 


Command  card  format; 
Function: 


SENTNC 

SENTNC 

Monitor  -  dependent 

Configuration-independent  (except  for  reference  to 
SNTAP) 

Command  entry  point 
$SENTENCES 

Reads  words  with  tneir  constituent  tags  from  input 
(or  from  A5  if  option  LOOKUP  was  specified).  Counts 
sentence  and  word  numbers.  When  a  sentence  is  read 
in,  the  syntactic  alternatives  are  stored  in  WORDT. 
For  each  word,  an  entry  in  WORDS  is  created  pointing 
to  where  in  WORDT  the  first  alternative  constituent 
for  that  word  is,  ar  i  telling  how  many  alternatives 
there  are.  At  the  end  of  a  sentence  (*♦  card)  control 
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DECK  NAMED 
Main  program. 
Deck: 

Type: 

Function: 


ROUTINES  FOR 

Entry  points: 

Deck: 

Routine: 

Tyr* : 

Calling  sequence 
Cl 


is  returned  to  the  rouitne  w!m  h  lust  uLi  <1  m  motor 
{which  ir  the  main  program  for  syntu<  »:  u  <i.y*i«). 
This  routine  {deck  name  A N  A  L Y  Z )  will  p  ue  the 
sentences  and  r»v-  rn  directly  to  entry  pc  .  SENTNC. 


ANALVZ 


No  entry  points. 

ANALYZ 
Main  program 
Configuration -independent 
Calls  monitor 

Calls  monitor  first  to  process  ail  grammar  reading, 
setting  of  options,  and  otiu  r  functions  performed  by- 
commands  under  the  monitor.  When  the  SENTENCES 
card  is  reached,  the  SENTNC  program  will  return 
control  to  this  main  program  at  the  end  of  each  sen¬ 
tence.  The  ANALYZ  deck  will  parse  the  sentence 
whose  constituents  are  defined  in  the  WORDS  table, 
creating  a  BIGTAB,  showing  all  the  legitimate  con¬ 
stituents  of  the  sentence.  It  does  this  by  combining 
•  ••e^y  nf  adjrcent.  constituents  according  to  the 
grammar  into  3  new  constituent,  by  calling  GETRL 
with  every  pair  of  adjacent  constituents.  GETRL 
will  return  the  list  of  valid  new  constituents,  and  the 
process  of  combination  is  continued  until  all  possi¬ 
bilities  have  been  tried.  Then  PPANS  is  called  to 
print  the  results  out  and  control  is  returned  to 
SENTNC  for  the  next  sentence  to  be  parsed. 


DECK  NAMED  GETRL 

GETRL  DOLL 

GETRL 

GETRL 

Configu  rat  ion -independent 
Needs  CSA  grammar  RULTAB  'n  COMMON 
Entry  point 
:  CAL  Ci 

LDQ  C2 
TSX  GETRL,  4 

PZE  part  of  speech.  ,  tags  left  constituent) 


function; 


PZE  part  of  spr  m  h,  ,  ta^s  (tor  right  constituent) 
GETRL  returns  in  the  AC: 

PZE  address  of  f;rst  cheated  constituent,  ,  number 
02  created  constituents  (or  zero  if  none).  GETRL 
finds  the  rule  packet,  if  any,  for  the  part-of- speech 
pair  read  into  it.  If  there  is  a  rule  packet,  GETRL 
starts  to  apply  the  fii»l  oubrule.  After  applying  a 
subrule,  GETRL  examines  the  transfer  address  of 
the  subrule.  If  the  subrule  succeeded,  the  next  sub¬ 
rule  is  taken  from  the  success  transfer.  If  the  sub¬ 
rule  failed,  the  next  submle  is  taken  from  the  failure 
transfer.  If  the  appropriate  transfer  address  is 
zero,  GETRL  is  through  with  the  given  rule  packet. 

It  then  returns  the  pointer  to  the  list  of  new  consti¬ 
tuents  which  it  has  generated  for  successful  «ud- 
rules  (if  any). 


Deck: 

Routine : 

Type: 

Calling  sequence: 


a  ttribute 
constituent 


Function: 


GETRL 

GETATT 

Configuration -independent 

Internal  to  GETRL 

CAL  attribute 

LDQ  constituent 

TSX  GETATT.  4 

.  .  .  return  if  attribute  not  found.  .  . 

.  .  .  return  if  attribute  found.  .  . 

PZE  address  of  attribute 
PZE  ,  .address  of  data  lag  to  be  searched: 
T.AGTAB  in  HASH  TO 

Scans  data  tag  looking  for  an  attribute  which  matches 
the  given  attribute.  If  not  found,  returns  1,  4.  If 
found,  return#  2  4,  with  the  ability  to  call  GETVAL 
to  get  the  values  of  this  attribute. 


Deck: 

Routine : 

Type: 

Calling  sequence: 


GETRL 

GETVAL 

Confi g u ration  -  in depe n dt  n  t 
Internal  to  GETRL 
TSX  GETVAL,  4 
,  .  .  return  if  exhausted  all  values.  . 

.  .  .  return  if  routine  i«  supplying  a  value.  .  . 

This  routine  is  c***  led  aite1-  GETATT  has  been  called 
and  has  found  an  attribute  on.  a  8j.uH.fic  data,  consti  ¬ 
tuent.  Each  call  to  GETVAL  gets  thr  next  value  on. 
the  attribute,  If  there  are  no  more  values,  exit#  i,  4, 
If  there  are  values,  exit#  2,  4  with  value  in  AC. 


Function: 


ROUTINE  FOR  DECK  NaMED 


OF  HON 


System  e.nt”y  pnT.tf  ■ 

Deck: 

Command  entry  point; 
Type: 

Command  card  format: 
Function: 


DECK  NAMED 
System  entry  point: 


TABLES  IN  COMMON 

IMbie: 

Location: 

Size : 

Each  entry  format; 


OPT  SHORT  TNODE  NOMAX  LOOKUP 
RULTAP 

OPTION 

OPT 

Monitor -dependent 
Configuration -independent 
Command  entry  point 
$OPTION  parameter-1  parameter-2 
Acceptable  parameters  are:  RULTAP  TNODE 
LOOKUP  NOMAX  SHORT.  When  any  of  these 
parameters  is  encountered,  the  switch  with  the 
same  name  is  turned  on.  All  unspecified  switches 
are  turned  off.  These  svUtches  «re  system  entry 
points  which  may  be  interrogated  by  other  commands 
or  subroutines  in  the  system.  Future  options  may 
be  included  by  expanding  the  list  of  parameters  com¬ 
pared  and  expanding  the  vocabulary  of  entry  points. 


END 


END 

Must  be  present  to  tell  routine  DPROG  in  DUMP 
where  the  end  of  the  CSA  program  is. 


AREA:  BIOTAB  RULTAB  WORDS 


BIGTAB 

COMMON 

5000  words  (1666  entries) 

PZE  right  word  number,  ,  left  word  number 

PZE  right  subconstituein  address.  ,  left  subronsn- 
tuent  add  .'es  s 

XYO  part-of-  peech  address,  .address  of  tags 

(X  =  1  if  this  ercry  has  the  same  span  and  constituent 
form  as  an  earlier  one) 

(Y  •-  1  if  another  'ntry  later  in  the  table  is  marked 
X  =  1  by  virtue  of  being  identical  in  span  and 
constituent  to  this  entry,  and  X  =  0  for  this 
entry.) 

W  ord  numbers  are  stored  in  the  form  of  the  2’s 


Table: 

Location: 

Size: 

Each  entry 


Table: 


t>.  •  rm-n*  >f  t  a  *  the  nui'iL  r  "  K  ■  word  1 

A'jrii'i  'if  r)  t-  2' «.  ent  <■:  '-0  ( o<"  ta  i  77742). 

Tne  » u be. ons t  : tuf  nt  addr a*e  the  2’ s  comple¬ 
ments  of  the  relative  address  in  filGT  A H  of  the 
entry  heing  r  e'er  red  to. 


format: 


RULTAB 

COMMON 

62  four  -word  entries 
PZE  C  j , , Gr 

PZE  number  of  accesses,  ,  of  failures 

PZE  address  of  first  subrule,  ,  number  subrules 
PZE  NOT  USE’ .i  (should  be  dropped  entirely) 

C-t  contains  left  part  of  speech 

Cy  contains  li^'ht  part  of  speech 

The  subrules  are  stored  in  a  table  called  TAGTAB, 

internal  to  the  deck  GRA.MAR.  The  format  of  this 

table  is  shown  in  description  of  the  GRAMAR  routine 


WORDS 

(See  description  of  deck  SEN'iNC.) 


OH  '? 


APPENDIX  I'D:  Dictionary  Assembly/ Update  Program  Logs;  Manna j 


Deck  Name 


MONITOR 


CONTROL 


ASSLY 


UPDATE 


DICFNT 


n  rc/a  ju-  x- 


CARDRD 


INT  ER 


Table  of  Contents 


Function 


Transfer  vector  to  stait  the 
system 

Controls  data  flr-w  by  various 
parameters 

Assembles  and  sorts  dictionary 
records 

Updates  dictionary  by  a  match/ 
merge  process 

Puts  °ut  BCD  dictionary  records 
for  printing 

neau*  arm  prints  upetate  par  an  eter 

Reads  dictionary  input  cards 

Bridges  common  boundary  between 
TOFMS,  the  Assembly /Update  rou- 
tines,  and  the  .Sort/Me rge  system 


IB  -  pa  ge 


i  ;  .  <!  f-  h 

Ij.  V 
J  t  i  : 

7  . 

1'  :  ::ct  i  -  'll : 


routine:;  for 

System  entry  poi 

Deck: 

Routine : 

Ty  pe : 

Update  iyetem 
entry  point: 
Function: 


De  ck: 
Routine : 
Type: 

Fu  nclion: 


'  :c:  '  \  >  . :  MONITOR 


.Vi  )I\ }  For  t  -[ 

v.s  > k i  re.: 

i‘  rum,  £ <-  r  v e ...  •  or 

No  entry  point  i  1 

CoMi  j-uration -independent  i 

Must  be  present  to  transfer  the.  mac  hine  to  the  as- 

. scrnbiy  and  updating  system.  Practically  a  start 

card.  Currently  transfers  ito  PA  RAMS.  May  later 
be  extended  to  a  full-fledged  monitor. 

I  ? 

a  ! 

"  1 

DECK  NAMED  CONTROL  j 

\  i 

tits:  DICTPR  INTAPE  OUTPUT  DUMMY 

OUTAPE  CDTAPE  CAR DPR  DATE  RUSS 

CONTROL  f  : 

PA  RAMS 

Configuration-independent  :  { 

PA  RAMS 

This  program  reads  the  parameters  in  a  STACK 
whose  contents  are  provided  by  SENTP.D.  (Uses 
READP  deck  by  TSX  SENTRD,  4.)  Checks  for  cor¬ 
rect  parameters  and  prints1;  error  messages.  If 
parameters  are  incorrectly  specified,  program 
stops  to  allow  for  corrections  according  to  the  print¬ 
out  message.  Undefined  parameters  are  net  by  de¬ 
fault.  Acceptable  parameters  of  the  $UPDATE  card, 
which  cause  a  setting  of  the-  respective  switches, 
are:  RUSSIA,  PR  CARD,  NOCAPR,  PRD1CT, 

NODI  PR,  NOIN,;  NOOUT,  ;UPDIN,  UPDOU,  DATE, 
DEBUG.  The  switches  are;  system  entry  points 
which  may  be  interrogated  by  other  commands  or 
subroutines  in  the  system.  Additional  parameters 
can  be  included  by  expanding  both  the  list  of  accept¬ 
able  parameters  and  the  vocabulary  o_  entry  points. 

CONTROL 

SORTQ  |  ; 

. Configuration-independent  j 

Internal  to  CONTROL 
Direct  transfer  point 

If  assembled  update  records  are  to  be  test-printed 


a 

o 


O 


o 

00 


because  parameter  DEBUG  was  specified,  SGRTQ 
writes  them  all  cut  in  octal  representation  and  quits. 
If  DEBUG  was  not  specified,  SORTC.  fines  the  sort 
area,  sort/merge  tanes,  and  buffer  areas  for  the 
assembled  update  records.  These  sort  parameter  s 
are  interrogated  by  the  Sort/Merge  system. 


ROUTINES  FOR  DECK  NAMED 


ASSLY 


System  entry  points:  INPUT  SORT  SORTM 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


ASSLY 

INPUT 

Configuration- independent 
Entry  point 
TSX  INPUT,  4 

Assembles  dictionary  update  records.  The  Sort/ 
Merge  system  will  call  INPUT  to  get  an  assembled 
dictionary  record  to  sort  on.  Before  INPUT 
branches  back  to  its  caller,  the  accumulator  must 
contain  the  address  and  the  word  count  of  the  as¬ 
sembled  record.  If  the  accumulator  contains  +  0, 


the  input  record  end-of-file  has  been  reached 
INPUT  interrogates  the  language  parameter  to  see 
whether  Russian  words  are  to  be  processed.  If  so, 
all  "Cyrillic”  BCD  words  are  converted  into  an  in¬ 
ternal  dictionary  sort  code  to  collate  on  Cyrillic 
sequence  rather  than  7094  standard  9-code.  This 
routin'*  reads  "logical  cards"  of  adds  and  deletes  for 
the  dictionary.  A  logical  card  runs  from  columns 
1-72,  unless  column  72  is  punched  with  an  11-punch 
(minus  sign),  in  which  case  it  includes  columns  1-71 
plus  continuation  from  the  next  physical  card.  Any 
number  of  physical  cards  can  form  a  logical  card  as 
long  as  a  "-"  in  column  7  2  indicates  that  the  next 
card  is  a  continuation.  In  a  logical  add  card,  an  * 
in  column  i  indicates  that  the  card  contains  a  na¬ 
tural  language  word  that  w’ill  constitute  the  argument 
field  in  the  corresponding  assembled  record.  Ab¬ 
sence  of  an  *  in  column  i  of  a  logical  add  card  indi¬ 
cates  that  the  card  contains  a  syntactic  alternative 
which  will  become  the  function  field  in  the  assembled 
record.  Add  cards  are  preceded  by  a  $ADD  control 
card.  In  a  logical  delete  card,  an  *  must  appear  in 
column  1  followed  by  the  word  to  be  deleted.  If 
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Deck: 

Routine . 
Type: 

Calling  sequ<= 
Function: 


Deck: 

Routine: 

i’ype: 


certain  alternatives  only  are  to  be  deleted,  their 
numbers  (separated  by  comrn-is)  must  follow  the 
word  and  be  separated  from  it  by  at  least  one  blank. 
Delete  cards  are  preceded  by  a  $DELETE  control 
card.  All  assembled  update  records  contain  in  their 
first  36-bit  word  the  information  ADD  or  DEL  and 
the  alternative  number(s)  (0  for  master  record). 
Assembled  records  consist  of  an  ADD  or  DELETE 
control  word  with  alternative  number(s)  (0  for  mas¬ 
ter),  a  date-word,  an  argument  field  (which  is  con¬ 
verted  into  Cyrillic  collating  sequence  if  it  is  Rus¬ 
sian),  a  36-bit  zero  worn,  and  a  function  field  (fol¬ 
lowed  by  another  36-bit  zero  word  if  it  is  an  add 
entry).  A  $END  card  closes  the  assembly.  If  an 
invalid  upd'  control  is  found,  an  error  message 
is  print>  d  ou.  and  the  job  is  interrupted. 

AS, SLY 
SORT 

Coniigura  don-  independent 
Entry  point 

ace .  TbX  SORT,  4 

This  routine  compares  two  records  to  tell  the  Sort/ 
Mrrge  system  in  which  sequence  each  record  is  to 
be  written  out.  Index  register  1  and  index  register 
2  contain  the  pointers  to  the  two  records,  i.e.,  CAL 

1,  1  will  ge^  the  first  word  of  record  'A!  and  CAL  1,  2 
will  get  the  first  word  of  record  ;B\  It  must  return 
3,  4  if  record  A  is  greater  than  B,  low-to-high  sort, 

2,  4  if  record  A  equals  B  low-to-high  sort,  ard  1,  4 
if  record  A  is  less  than  B,  low-to-high  sort.  It  must 
return  i,  4  if  record  A  is  greater  than  B,  high-to-low 
sort,  2,  4  if  record  A  equals  B,  high-to-low  sort,  and 

3,  4  if  record  A  is  less  than  B,  high-to-low  sort.  In¬ 
dex  registers  1  and  2  must  not  be  destroyed  before 
returning.  In  the  update  procedure,  this  routine  per¬ 
forms  a  comparison  such  that  the  assembled  dictio¬ 
nary  records  are  sorted  on  the  natural  language  word, 
low-to-high,  as  major  field,  on  the  alternative  num¬ 
ber,  low-to-high,  as  intermediate  field,  and  on  the 
delete  or  add  control,  high-to-low,  as  minor  field. 

ASSLY 
SOR  TM 

Configuration -independent 
Entry  point 
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Calling  sequence:  TSX  SOP.TM,  4 

Function:  This  routine  is  identical  to  SORT,  but  is  only  trans¬ 

ferred  to  during  the  final  merge.  All  return  com¬ 
mands  are  the  same  as  in  SORT.  SORTM  is  not  used 
if  the  dataset  is  to  be  sorted  only  once. 


ROUTINE  FOR  DECK  NAMED  UPDATE 

System  entry  point:  UPDATE 

Deck:  UPDATE 

Routine:  UPDATE 

Type:  Configuration-independent 

Entry  point 
Direct  transfer  point 

Function:  This  routine  is  the  heart  of  the  updating  program.  It 

calls  in  assembled,  sorted  logical  update  records  and 
logical  master  dictionary  records,  matches  them, 
deletes  logical  records,  adds  logical  records,  and 
updates  the  dictionary  file.  After  reachng  iu  a  dic¬ 
tionary  record,  UPDATE  reads  in  an  assembled  rec¬ 
ord  and  comparts  it  with  the  dictionary  record.  If 
the  dictionary  record  is  lower  than  the  assembled 
record,  the  matching  dictionary  record  has  not  yet 
been  reached  and  the  dictionary  r.-cord  is  written  un¬ 
changed  on  the  updated  file.  If  th^  dictionary  record 
is  higher  than  the  assembled  record,  the  latter  is  a 
completely  new  entry  and  is  written  in  its  entirety  on 
the  updated  file.  If  the  dictionary  record  is  equal  to 
the  assembled  record,  a  test  for  DELETE  or  ADD  is 
made.  If  the  assembled  record  is  tc  delete  the  dic¬ 
tionary  entry,  ti  e  next  dictionary  reccrd  is  read  in, 
thereby  erasing  vhe  entry  to  ue  deleted.  If  the  as¬ 
sembled  record  is  to  be  added  to  the  dictionary  entry, 
it  is  a  new  alternative  and  the  dictionary  entry  is  first 
written  with  all  aUerrnt;ves  unchanged  on  the  updated 
file.  Only  then  is  the  new  alternative  record  written 
on  the  file,  immediately  following  the  last  old  alter¬ 
native.  If  a  record  is  to  be  changed,  it  is  actually 
deleted,  and  then  replaced  by  a  newl1'  aodec  record. 


ROUTINES  FOR  DECK  NAMED  DICPNT 
System  entry  point:  DCTPN 


*>9 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Deck: 

Routine : 

Type; 

Calling  sequence: 
Function: 


Deck: 

Routine: 

Type; 

Calling  sequence: 
Function: 


DICPNT 

DCTPN 

Configuration -independent 
Entry  point 
CLA  IOWORD 
TSX  DCTPN,  4 

Writes  BCD  dictionary  records  on  a  print  file.  Index 
register  1  must  contain  the  file  name  of  the  BCD  dic¬ 
tionary  printout  file  before  this  calling  sequence  is 
executed.  The  file  name  is  a  decimal  digit  0,  1,  2, 
etc.  to  be  used  by  IOFMS,  a  set  of  input-output  tape 
control  routines  not  described  in  detail  here.  IO¬ 
WORD  contains  the  address  of  the  magical  record  to 
be  printed  in  its  address  field  and  the  number  of 
words  in  this  record  in  its  decrement  field.  If  the 
language  parameter  RUSSIA  was  specified  in  the 
$UPDATE  card,  a  conversion  from  the  internal 
"Cyrillic”  character  collating  sequence  into  the 
standard  7094  BCD  collating  code  will  take  place, 
DCTPN  composes  a  print  line  which  is  the  BCD  re¬ 
presentation  of  a  logical  dictionary  record  and  sends 
it  to  PPNT  in  IOFMS  which  writes  it  out  in  blocked 
format.  The  BCD  dictionary  print  file  should  be 
printed  on  the  1401  by  the  1401  PRINT  PPNT  PRO¬ 
GRAM,  but  can  also  be  printed  less  efficiently  via 
FMSXI. 

DICPNT 

STORC 

Configuration- independent 
Internal  to  DCTPN 
TSX  STORC,  4 

Stores  BCD  characters  one  by  one  in  a  20-word  (120 
character)  print  line.  Accumulator  must  contain  the 
character  right-adjusted  before  one  can  transfer  to 
this  routine. 

DICPNT 

PPNOT 

Configuration-independent 
Internal  to  DCTPN 
TSX  PPNOT,  4 

Dumps  completed  print-line  buffer  of  120  characters. 
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ROUTINE  FOR  DECK  NAMED 


READP 


System  entry  point: 

Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Example: 


SENTRD 

READP 

SENTRD 

Configuration  •  independent 
Entry  point 
TSX  SENTRD,  4 
PZE  STACK,  ,  SIZE 

The  user  can  specify  a  STACK  of  any  size  he  wishes. 
SENTRD  will  read  a  logical  card  as  defined  above. 
SENTRD  looks  for  strings  of  consecutive  alphabetic 
characters,  separated  by  blanks  or  by  '  break  char¬ 
acters"  (comma,  dollar  sign,  period,  parentheses, 
slash,  equals  sign).  Each  string  o i  alphabetic  char¬ 
acters  between  breaks  or  blanks,  as  well  as  each  in¬ 
dividual  break  character,  is  stored,  left  justified, 
padded  with  blanks,  in  the  next  free  location  in 
STACK.  If  a  string  is  longer  than  6  characters,  the 
components  of  the  string  are  separated  by  a  "logical 
concatenator"  symbol  consisting  of  the  word 
767676767676.  Blanks  on  card«.  which  ar a  used  only 
to  delimit  fields  of  text  characters,  are  not  put  into 
the  STACK.  After  the  last  word  has  been  stored  in 
STACK,  a  fence  of  777777777777  is  stored.  SIZE  is 
the  maximum  number  of  words  (including  fence) 
which  will  be  stored  in  the  STACK.  Overflow  will  be 
lost,  and  the  fence  stored  in  the  last  location,  if  the 
logical  card  has  too  many  fields. 

In  the  Assembly/Update  system  the  task  of  this  rou¬ 
tine,  which  is  also  utilized  in  the  Combinatorial 
Syntactic  Analyzer,  is  to  stack  and  edit  the  update 
parameters. 

Suppose  a  logical  update  control  card  contains; 
$UPDATE  ENGLISH,  NOCAPR,  PRDICT,  UPDIN, 
UPDOU,  DATE  090966 
The  stack  will  contain: 

$ 

UPDATE 

ENGLIS 

767676767676 

H 


NOCAPR 


PRDICT 


UPD1N 

» 

UPDOU 

# 

DATE 

090966 

The  first  72  characters  of  this  update  card  will  also 
be  printed  for  a  visual  check. 


ROUTINE  FOR  DECK  NAMED  CAR  DR  D 


System  entry  points:  CARDRD  REDSW 


Deck: 

Routine: 

Type: 


Calling  sequence: 


Function: 


CARDRD 

CARDRD 

Update -system  independent 

I/O  tape  dependent 

Entry  point 

TSX  CARDRD,  4 

PZE  USERBF, ,  LENGTH 

.  .  .  end-of-file  return.  .  . 

.  .  .  redundancy  return.  .  . 

.  .  .  normal  return.  .  . 

Reads  one  physical  card  image  into  user's  buffer 
from  an  input  tape  (whose  number  may  be  redefined 
at  any  time).  Currently  the  7094  card  input  tape 
A2  is  used. 


ROUTINES  FOR  DECK  NAMED  INTER 


System  entry  points:  EDITX  BCDCAN  PHYREW  WTCKA  WTCHJ3 

CHKAG  CHKBG  WA3  MEGFIL  GETREC 
PUTREC  OPENIN  OPENOU  CLOSOU 


Deck: 

Routine: 

Type: 

Calling  sequence: 


INTER 

EDITX 

7094  tape  I/O  dependent 
Entry  point 
TSX  EDITX.  4 

.  .  .  EDIT  commands  (see  IBM  Research  Computing 
Center  edit  manual).  .  . 
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Function; 


Deck, 

Routine ; 

Type; 

Calling  sequence: 
Function; 


Deck; 

Routine; 

‘■*P«:  ' 

Calling;  eeqijence;- 
FuncHon:  ' 


Deck; 

Routine; 

Type; 


CaUingne  q-w  tic  e; 


Checks  whether  Channel  a  is  still  in  operation  when 
ah  user  wants  to  use  it  for  EDIT.  Since,  in  addition 
to  EDIT,  IOFMS,  Sort/Merge,  and  C/JR.DRD  also  use 
Channel  A,  their  Channel-A  operation  must  be  com¬ 
pleted  before  EDIT  can  use  it.  EDITX  serves  to  fin¬ 
ish  up  such  an  operation,  if  it  is  in  progress.  It  also 
saves  the  redundancy  Btatus  of  Channel  A  for  the 
card  read  routine  by  setting  a  switch  which  is  tested 
in  CARDRD. 

UN  1  r-M 

MEG  FID 

7094  tape  I/O  dependent 

Entry  point 

TSX  MEGFIL,  4 

PZE  USERBF,  ,  LENGTH 

Will  get  one  logical  record  from  the  merge  file  and 
put  it  into  the  user's  buffer.  The  address  of  the 
user's  bufier  and  its  LENGTH  must  be  specified  in 
the  control  word  t  >llov/iiig  the  TSX  instruction.  Uses 
TSX  SORTX,  4  t<  get  the  next  following  logical  rec¬ 
ord  from  either  one  of  two  files  to  be  finally  merged. 

INTER 

GETREC 

7094  tape  I/O  dependent 

Entry  point 

TSX  GETREC,  4 

PZE  USER3F,  .  LENGTH 

Will  get  one  logical  record  from  an  input  file  and  put 
.  it  into  the  user's  buffer.  The  address  of  the  user's 
buffer  and  its  LENGTH  must  be  specified  in  the  con- 
-rel  word  following  the  TSX  instruction.  Uses  TSX 
READ,  4  to  get  a  logical  input  record. 

'  INTER  - 
PUTREC 

7094  Cape  I/O  dependent 

Entry  point 

TS!X  PUTREC,  4 

PZE  USERBF  , LENGTH 

Will  take  on#  logical  record  as  defined  by  the  control 
.  word  foBowing  the  TSX  instruction  and  put  it  on  the 
output  ftl-4.  Uses  TSX  WRITE,  4  to  output  the  logical 
record  If  the  parameter  PRDICT  appears  on  the 
.  fUPjSATE  PUTREC  also  gives  the  dictionary 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Deck: 

Routine: 

Tn>«- 


BCD  print  routine  the  location  of  the  logical  record, 
so  that  the  latter  can  put  it  (in  BCD  form)  on  a  print 
file. 

INTER 

OPENIN 

70%  tape  I/O  dependent 
Entry  point 
CAL  IN  TAPE 
TSX  OPENIN,  4 

Tells  the  input/ output  tape  control  routine.; 
which  input  file  is  to  be  opened.  The  name  of  the 
file  must  be  in  the  accumulator  before  one  can  TSX 
to  this  routine.  In  the  Assembly/Update  system, 
the  input  master  dictionary  file  has  to  be  opened. 

INTER 

OPENOU 

7094  tape  I/O  dependent 
Entry  point 
CAL  OUTAPE 
TSX  OPENOU.  1 

Tells  IOFMS  which  output  file  is  to  be  opened.  The 
name  of  the  file  must  be  in  the  accumulator  before 
one  can  TSX  to  this  routine.  In  the  Assembly/Update 
system,  the  output  master  dictionary  file  and,  if 
parameter  PRDICT  (print  dictionary)  was  stated,  the 
output  BCD  print  dictionary  file  have  to  be  opened. 

INTER 

BCDCAN 

7094  tape  I/O  dependent 
Entry  point 
CAL  file  name 
TSX  BCDCAN,  4 

Returns  in  the  accumulator  the  15-bit  internal  ad¬ 
dress  of  the  device  possessing  the  given  file  name. 
File  name  must  be  in  the  accumulator  before  one 
can  rSX  to  this  routine.  Only  used  lor  compatibility 
with  CSA  system.  In  the  Assembly/Update  system, 
BCDCAN  transfers  immediately  back  to  caller.  See 
also  BCDCAN  routine  in  CSA  program  logic  manual. 

INTER 

PHYREW 

7094  tape  I/O  dependent 
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Calling  sequence: 
Function: 


Deck: 

Routin'*: 

Type: 

Calling  sequence: 
Function: 


Deck: 

Routine: 

Type: 

Calling  sequence: 

f  i'  crion: 


D.*ck: 

Routine  • 

Type: 

Calling  sequence: 
Function: 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Entry  point 
CAL  file  name 
TSX  PHYREW,  4 

Rewinds  a  tape  with  a  file  whose  name  is  in  the  ac¬ 
cumulator  at  the  time  this  routine  is  entered.  Only 
used  for  compatibility  with  CSA  system.  In  the  up¬ 
date  routine,  PHYREW  transfers  immediately  back 
to  caller.  Rewinding  operations  are  initialized  by 
the  routine  CLOSOU  (which  see). 

IN  TER 
CLOSOU 

7094  tape  I/O  dependent 
Entry  point 
CAL  file  name 
TSX  CLOSOU,  4 

Closes  all  output  operations,  initializes  rewinding  of 
tapes  and  terminates  I/O  subroutines.  Uses  ENDIO 
of  IOFMS.  Logs  all  rec  4  counts  (logical  and  block 
records,  ail  frame  numbers,  total  rec":'4  '■ounts  of 
all  files). 

INTER 

WTCHA 

7094  tape  I/O  dependent 

Entry  point 

TSX  WTCHA,  4 

Makes  Channel  A  available  to  the  input/output  tape 
control  routines  IOFMS.  Called  normally  from  the 
I/O  and  transfers  to  the  Sort/Merge  system. 

INTER 

WTCHB 

7094  tape  I/O  dependent 

Entry  point 

TSX  WTCHB,  4 

Makes  Chanuei  B  a  vailable  to  IOFMS.  Called  normal¬ 
ly  from  I/O  and  transfers  to  the  sorting  routine. 

INTER 

CHKAG 

7094  tape  I/O  deo^ndent 
Entry  point 

CHKAG.  4 

Channel  A  available  to  the  Sort/Merge  system. 
Called  normally  from  ne  Sort/Merge  system  and 


transfer*  to  IOFMS. 


Deck: 

Routine: 

Type: 

Calling  sequence: 
Function: 


Deck: 

Kouttne: 

Type: 

Calling  sequence: 
Function: 


INTER 

CHKBG 

7094  tape  I/O  dependent 
Entry  point 
TSX  CHKBG 

Makes  Channel  B  available  to  the  Sort/Merge  system. 
Called  normally  from  the  Sort/Merge  system  and 
transfers  to  IOFMS. 

INTER 

Jit  A 

7094  tape  I/O  dependent 
Entry  point 
TSX  WA3 

Is  a  request  by  the  Sort/Merge  system  to  write  out 
messages.  Transfers  to  the  write  routine  W  103 
of  IOFMS. 


II. 


THE  CSA  RUSSIAN  GRAMMAR:  LINGUISTIC  RESEARCH  AND 
RELATED  LANGUAGE  PROCESSING  ACTIVITIES 


c..  0  Introduction 

This  section  of  the  report  describes  that  portion  of  contract-supported 
linguistic  research  and  related  language  processing  which  was  directed  pri¬ 
marily  towards  the  development  of  a  Russian  surface  grammar  for  the  CSA 
•: y stem.  The  main  results  of  the  linguistic  research  on  Russian  grammar, 
which  involved  a  cyclical  process  of  formulation,  testing,  and  revision  of 
parsing  rules  in  CSA  format,  are  presented  in  Section  Z.  1.  Section  Z.  Z  de¬ 
scribes  an  independent  exploratory  study  on  subclassification  of  Russian 
parts  of  speech  which  was  conducted  with  the  assistance  of  Library  of  Con¬ 
gress  personnel.  Related  language  processing  activities,  consisting  prin¬ 
cipally  of  Russian  Master  Dictionary  (RMD)  updates  and  processing  of  a 
large  Russian  text  corpus  preliminary  to  obtaining  statistical  information, 
are  summarized  in  Section  2.  3. 

2.  1  Grammatical  Research  on  Russian  for  CSA 

The  central  focus  of  the  linguistic  research  conducted  under  the  con¬ 
tract  was  a  study  of  Russian  surface  structure  phenomena  leading  to  the  de¬ 
velopment  and  testing  of  grammar  rules  for  parsing  Russian  sentences.  The 
results  of  this  study  are  reflected  in  part  in  two  experimental  Russian  gram¬ 
mars,  RG1  and  RQ2,  expressed  in  the  tag  metalanguage  of  the  CSA  system 
(cf.  Section  1.  2).  P„G1,  a  relatively  small  grammar  employed  largely  for 

system  debugging  purposes,  was  based  on  only  the  first  500  words  (20  sen¬ 
tences)  of  a  ioO-sentence  sample  drawn  randomly  from  a  30,  000  word  car¬ 
pus  (1600  sentences)  of  Pravda  editorials.  Since  RG1  has  been  entirely 
superseded  by  RG2,  its  considerably  more  extensive  successor,  only  the 
latter  is  described  in  this  report.  • 

The  second  experimental  grammar,  RG2,‘  based  on  both  the  entire. 
160-sentence  Pravda  sample  and  a  variety  of  referenc er  on  Russian  gram¬ 
mar,  constitutes  a  relatively  extensive  preliminary  set  of  grammar  rules 
for  surface  structure  recognition  of  Russian  sentences.  Its  formulation  was 
regarded  as  one  stage  in  a  cyclical  process  consisting  of  formulation,  testing, 
and  revision  of  grammar  rules.  Consistent  with  this  approach,  a  number  of 
recurrent  linguistic  problems  that  were  encountered  in  testing  the  rules  of 
RG2  were  subsequently  investigated  in  greater  .detail,  an  activity  which  in  a 
number  of  instances  led  to  the  formulation  of  new  rules  or  the  revision  of 
existing  ones.  The  present  section  includes  a  discussion  of  a  representative 
sample  of  RG2  rules  grouped  according  to  major;  grammatical  topics.  Where 
applicable,  the  discussion  of  a  given  topic  includes  a  summary  of  the  lindings 


of  subsequent  linguistic  investigations  and  present-,  any  sets  of  new  or  re¬ 
vised  rule  a  that  have  been  tentatively  proposed  as  a  result. 

2.  1.  1  Definitions  and  Underlying  Assumptions 

Since  Soviet  works  on  Russian  grammar  were  referred  to  extensively 
in  the  linguistic  research  on  the  CSA  Russian  grammar,  a  number  of  terms 
and  concents  'mptoyed  in  those  sources  have  been  adopted  in  this  section  of 
the  report.  Because  of  the  likelihood  that  they  will  be  unfamiliar  to  many 
readers,  explanations  of  concepts  and  definitions  of  terms  are  provided  here 
-long  with  a  statement  of  certain  assumptions  involved  in  the  present  study. 

Ac  deserved  in  Section  1.  1  of  this  report,  prior  to  analysis  each 
"word"  in  a  sentence,  including  sentential  punctuation, *  is  assigned  a  set 
of  mutually  exclusive  syntactic  alternatives,  each  of  which  is  in  the  form 
of  a  structured  symbol  consisting  of  a  part-of-speech  name ^  followed  by  a 
(possibly  null)  string  of  tags.  A  complete  analysis  of  a  sentence  is  one  in 
which  systematic  combination  of  adjacent  syntactic  alternatives  and  higher  - 
order  constituents  according  to  the  rules  of  a  grammar  results  in  formation 
of  a  sentence  constituent  S  which  spans  the  entire  input  sentence.  In  the 
CSA  Russian  grammar,  it  is  assumed  that  each  such  S  immediately  domi¬ 
nates  a  constituent  PRED,  together  with  any  associated  sentential  punctua¬ 
tion.  The  PRED  constituent,  composed  of  the  subject  and  predicate  of  the 
sentence,  is  referred  to  in  this  report  as  the  predication  constituent  or 
prediction.  All  other  constitr  'nts  are  collectively  referred  to  as  non¬ 
predicative  constituents. 

sn  discussing  various  members  of  the  set  of  part-of-speech  classes, 
the  distinction  commonly  drawn  between  lexical  and  function^  words  will 
occasionally  be  employed.  In  addition,  an  analogous  distinction  will  be 
made  between  multi-word  constituents  that  can  function  by  themselves  as 
higher-order  constructions  and  those  that  can  function  only  as  constituents 
of  higher-order  constructions.  The  necessity  of  defining  constituents  of 
the  latter  type,  which  will  be  called  qua  si- constituents,  is  primarily  a  by¬ 
product  of  the  binary  branching  format  required  by  the  current  version  of 
the  metalanguage.  For  example,  ir.  parsing  a  coordinative  construction 
like  "Peter,  John  and  Mary"  it  is  necessary  at  some  point  to  create  the 
quasi-constituents  ".John"  and  "and  Mary"  which  are  not  higher-order  con¬ 
structions  in  a  linguistic  sense  but  combine  with  the  constituent  "Peter"  to 
form  one. 

Two  lexical  words  (or  any  two  non-predicative  constituents)  linked 
by  predicative  agreement  (i.e.,  subject-predicate  agreement)  form  a  predi¬ 
cation;  if  they  stand  in  apposition  to  one  another,  they  constitute  an  apposi- 
tion"*;  if  linked  by  non-predicative  agreement,  government,  or  adjoinment 
(primykant?),  they  form  endocentric  word  groups  (slovosochetanie^). 


Constituents  not  linked  to  any  other  constituent  in  one  of  the  ways  just  enu¬ 
merated  (sometimes  referred  to  as  "parenthetic"  constituents*)  are  said  in 
the  Soviet  literature  to  be  unrelated  to  the  surface  structure  of  the  sentence. 

Lexical  words,  predications,  appositions,  and  word  groups  can  be 
conjoined  by  coordinative  ani  subordinative  conjunctions,  in  which  case  the 
resultant  constituents  are  referred  to  as  compound  and  complex,  respec¬ 
tively.^  In  certain  situations,  strings  of  adjectives,  nouns  or  numerals 
whose  members  are  identical  in  syntactic  function  can  form  accumulative 
strings*  For  instance,  in  B0L6W01  BELYI  KAMEWNY1  DOM  ('a  large 
white  stone  house1)  the  adjectives  form  such  a  string. 

Two  function  words  can  produce  only  a  quasi-constituent.  However, 
there  are  three  basic  possibilities  for  combining  a  function  word  with  a 
lexical  word,  a  predication,  an  apposition,  or  an  endocentric  word  group: 

1.  a  preposition  combines  with  its  governed  complement  to  form  a  prepo¬ 
sitional  phrase  whose  name  is  additionally  qualified  by  the  class  of  the 
complement  (e.g.,  preposition-ordinal  phrase,  preposition-noun  phrase); 

2.  conjunctions  combine  with  predications  to  produce  clauses  identified 
by  the  class  of  conjunction  as  either  coordinate  or  subordinate; 

3.  conjunctions  in  combination  with  non -predicative  constituents  form 
coordinate  or  subordinate  phrases.  (The  term  phrase  is  also  em¬ 
ployed  in  referring  to  accumulative  strings  anc’  is  used  interchange¬ 
ably  with  the  term  word  group.) 

Subordinate  clauses  and  phrases  are  typically  set  off  by  paired  com¬ 
mas;  coordinate  clauses  and  phrases,  where  appropriate,  are  separated  by 
commas  and  other  punctuati'--  Constituents  of  predications  and  of  word 
groups  linked  by  government  cannot  be  separated  or  set  off  by  punctuation. 
Appositions  and  word  groups  linked  by  agreement  o*  adjoinment  are  typi¬ 
cally  not  punctuated.  However,  dependent  constituents  in  these  construc¬ 
tions  can  be  set  off  by  commas  or  equivalent  detaching  punctuation  from  the 
dominant  constituent  in  the  respective  group.  This  circumstance  is  noted 
by  referring  to  the  constituent  set  off  by  commas  or  equivalent  punctuation 
from  the  dominant  word  as  being  detached.  The  following  contains  a  de¬ 
tached  participial  phrase:  STOL,  POKRYTY1  GAZETOi,  ...  ('a  table 
covered  with  a  newspaper  ...'). 

Constructions  and  constituents  which  are  formed  by  predicative  ties, 
appositive  ties,  government,  agreement,  or  adjoinment  and  which  do  not 
contain"?  any  punctuation  or  conjunctions  are  referred  to  as  simple;  all  others 


2.  1.  2  Restrictions  on  the  Scope  of  RG2 


In  addition  to  being  confined  to  surface  structure  phenomena,  re- 
strictions  of  two  other  kinds  were  imposed  on  the  scope  of  RG2:  (a)  restric¬ 
tions  on  the  range  of  construction  types  covered  and  (b),  within  certain  of 
the  types  considered,  restrictions  on  the  nature  of  coverage  as  a  result  of 
various  simplifying  assumptions.  The  principal  restrictions  in  the  former 
category  were  the  following:  The  only  sentence  types  considered  were  de¬ 
clarative  sentences  containing  neither  direct  quotes  nor  strings  of  charac¬ 
ters  of  formal  or  natural  languages  other  than  Russian.  The  main  focus 
of  coverage  in  RG2  is  placed  on  simple  word  groups  and  predication  consti¬ 
tuents.  A  limited  capacity  for  recognizing  appositions  and  compounds  was 
also  included.  Subordinate  clauses,  generally  limited  to  relative  KOTORY1- 
clauses  and  to  CTO-  and  CTOBY-clauses,  were  dealt  with  only  on  a  token 
basis.  Recognition  of  a  limited  number  of  detached  constituent  types  was 
also  attempted,  but  proved  to  be  a  difficult  task  because  of  the  need  for  ad¬ 
ditional  work  on  recognition  of  associated  punctuation  (mainly  separating 
and  detaching  commas).  In  the  course  of  testing  the  grammar  on  the  160- 
sentence  Pravda  editorials  sample,  a  few  ad  hoc  rules  were  introduced  to 
bridge  gaps  in  the  coverage  of  the  grammar.  Such  temporary  ad  hoc  rules 
were  clearly  labeled  as  such;  however,  since  they  are  linguistically  unin¬ 
teresting,  such  rules  are  not  discussed  further  in  this  report. 

The  most  significant  simplifying  assumption  involved  the  treatment 
of  preposition-noun  phrases,  which  were  arbitrarily  linked  to  left-adjacent 
potential  governors  with  the  following  exceptions.  Verbs  and  predications 
were  allowed  to  combine  with  preposition-noun  phrases  on  either  side  and 
were  given  preference  in  instances  where  a  prepositional  phrase  had  more 
than  one  adjacent  potential  governor.  The  rules  governing  combination  of 
adverbs  with  adjacent  constituents  were  also  considerably  oversimplified, 
due  to  the  absence  of  detailed  information  on  subcategorization  and  selection 
restrictions. 

Another  area  where  major  simplifications  were  made  was  that  of 
redundant  analyses,  i.e.,  sets  of  analyses  such  that  the  availability  of  one 
member  is  sufficient  to  predict  the  nature  of  the  remaining  members. 
Whether  such  analyses  were  trivially  different*^  (i.e.,  did  not  correspond 
to  significantly  contrasting  meanings)  or  differed  in  important  respects,1^ 
single  structural  descriptions  were  arbitrarily  assigned  to  some  of  the  more 
frequent  patterns  in  order  to  avoid  unnecessarily  voluminous  output. 

Word  order  restrictions  --  except  for  the  more  obvious  instances, 
such  as  preposition-noun  sequence,  adjective-noun  sequence,  and  (cardinal 
numeral) -noun  sequence,  as  well  as  the  artificial  restrictions  on  the  posi¬ 
tion  of  preposition-noun  phrases  relative  to  their  "governors",  --  were  not 
imposed  in  the  rules  of  RG2;  nor  could  the  linguistic  problems  of  word  order 


bo  adequately  explored  during  the  contract  period.13  Accordingly,  construc¬ 
tions  like  OTVET  $AVSTRU  $VENGRII  are  given  two  analyses,  one  of  which 
is  correct  {'reply  of  Austria  to  Hungary'},  the  other  incorrect  ('reply  of 
Hungary  to  Austria').1* 

2.  1.  3  Employment  of  the  Metalanguage  in  RG2 

In  view  of  the  flexibility  of  the  metalanguage  (Section  1.2),  a  number 
of  alternatives  are  in  general  available  for  expressing  the  same  linguistic 
facts  in  the  form  of  CSA  grammar  rules.  Before  turning  to  a  description  of 
individual  rules,  two  instances  where  the  choice  of  a  particular  alternative 
in  RG2  affected  broad  segments  of  the  grammar  will  be  discussed.  The  first 
instance  involved  two  measures  designed  to  reduce  the  number  of  tags  asso¬ 
ciated  with  most  constituents: 

1.  The  case  and  person  attributes  were  combined  into  a  single  abstract 
attribute  CP,  necessitating  a  split  of  the  traditional  nominative  case 
into  three  versions  --  those  of  the  first,  second,  and  third  person; 

2.  Number  and  gender  were  combined  in  similar  fashion  into  an  attri¬ 
bute  NG  with  values  plural  (P),  masculine  (M),  feminine  (F),  and 
neuter  (N). 

While  the  decision  to  combine  case  and  person  was  not  without  ben.fit,  in 
retrospect  it  would  have  been  better  to  keep  number  and  gender  separate, 
because  in  instances  where  only  number  agreement  is  required,  duplication 
of  rules  results.1^  Moreover,  the  loss  of  gender  distinctions  in  the  plural 
is  disadvantageous  in  some  situations.1 

More  serious  practical  difficulties  arose  in  the  second  instance  as  a 
result  of  the  approach  chosen  for  enforcing  a  particular  order  of  concatena¬ 
tion  in  endocentric  constructions  --  that  of  renaming  constituents.  The  two 
following  equivalent  sets  of  rules  for  a  hypothetical  constituent  "A"  illustrate 
the  major  alternatives. 

(1)  A  +  A  *  AP 
A  +  AP  =  AP 

(2)  A  PHRASE/*  +  A  =  A  PHRASE/PLUS 

Comment:  The  resultant  constituent  (C3}  in  (1)  is  called  an  A-phrase 
(AP).  The  C3  in  (2)  is  an  A  which  has  the  attribute  "phrase"  (PHRASE/ 
PLUS).  The  first  constituent  (C|)  in  (2)  can  be  any  A,  but  the  attribute 
PHRASE  must  either  have  no  value  or  not  be  preasn t  nt  -11,  as  indicated 
by  the  * . 

In  the  constituent  renaming  approach  (1)  it  is  assumed  that  there  are 
no  rules  of  the  form  AP  +  A  *  AP  and  AP  +  AP  »  AP  in  the  grammar.  More¬ 
over,  not  only  are  two  rules  necessary  to  accomplish  what  can  be  done  in  (2) 
by  a  single  rule,  but  also  almost  every  rule  in  which  A  enters  as  a  constitu- 
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ent  must  be  duplicated  to  take  care  of  the  constituent  AP.  An  analogous 
problem  arises  when  a  constituent  V  (verb)  can  govern  several  constituents 
N  (noun)  on  either  side.  Rules  of  the  type  V  +  N  *  V  and  N  -f  V  «  V  would 
produce  multiple  redundant  analyses  of  a  string  like  N  V  N  N.  In  order  to 
avoid  this,  the  following  equivalent  rule  types  are  possible. 

(3)  V  +  N  =  VP 
VP  +  N  *  VP 
N  +  V  =  V 

(4)  V  +  N  =  V  PHRASE/ PLUS 
N  +  V  PHRASE/*  =  V 

Comment:  In  (3)  VP  stands  for  some  constituent  "verb  phrase".  In  (4), 
the  class  name  of  the  constituent  is  not  changed  and  VP  is  written  as 
V  PHRASE/PLUS  with  a  resultant  saving  in  the  number  of  rules  in  which 
a  verb  or  verb  phrase  is  either  produced  or  combined  with  another  con¬ 
stituent. 

While  the  duplication  of  rules  is  avoided  in  approaches  (2)  and  (4), 
the  additional  tags  proliferate  so  that  the  rules  become  extremely  long  and 
correspondingly  difficult  for  a  human  to  interpret.  The  approach  illustrated 
in  (1)  and  (3)  was  selected  for  RG2  with  an  understanding  of  some  of  its  con¬ 
sequences,  but  without  full  knowledge  of  what  their  magnitude  might  be.*^ 

The  device  of  ordering  of  subrules  (Section  1.  2.  5)  was  not  employed  in  RG2 
because  it  was  written  before  this  feature  was  added  to  the  metalanguage. 

2.  1.  4  The  CSA  Russian  Grammar 

In  order  to  present  a  representative  sample  of  RG2  rules,  while  at 
the  same  time  illustrating  the  results  of  further  investigation  of  linguistic 
problems,  the  description  of  the  CSA  Russian  grammar  is  organized  ac¬ 
cording  to  the  major  grammatical  topics  addressed.  The  description  begins 
(A)  with  a  relatively  extensive  treatment  of  the  principal  types  of  noun 
phrases,  prepositional  phrases,  and  their  components,  followed  by  discus¬ 
sions  on  the  handling  of  predications  (B)  and  compound  constituents  (C).  It 
concludes  (D)  with  a  brief  analysis  of  problems  encountered  in  the  recogni¬ 
tion  of  punctuationally  delimited  constituents  followed  (E)  by  a  summary  of 
activity  on  the  lexicon  developed  in  support  of  RC2. 

A.  Noun  Phrases,  Prepositional  Phrases,  and  Their  Components 

In  the  case  of  noun  phrases  and  their  components,  the  discussion  will 
be  restricted  to  constituents  resulting  from  the  combination  of  long-form 
adjectives,  cardinal  numerals,  and  nouns,  as  well  as  of  constituents  result¬ 
ing  from  such  combinations.  Constructions  involving  the  combination  of 
Constituents  of  the  samv  part-of-speech  class  will  be  considered  first.  These 


include:  (1)  adjective  strings,  (2)  cardinal  numeral  strings,  and  noun -noun 
constructions,  the  latter  being  subdivided  into  (3)  noun-ncun  predications, 

(4)  close  appositions,  and  (5)  word  groups  formed  by  a  governing  noun.  The 
remainder  of  the  discussion  is  concerned  with  constructions  involving  two  or 
more  distinct  parts  of  speech,  including:  (6)  nouns  with  cardinal  numerals, 
(7)  nouns  with  adjectives,  (8)  nouns  with  cardinal  numerals  and  adjectives, 
and  (9)  preposition- noun  phrases. 

1.  Adjective  Strings 

Only  long-form  adjectives  of  the  types  snown  in  Table  II- 1  ar«  con¬ 
sidered  here.*®  Single  adjectives  and  unpunctuated  strings  of  adjectives,  tfcft 
accumulative  strings,  modify  a  noun  successively,  ie.(  the  bracketing  is  of  the 
general  form  (A  (A  (A  N))),  where  A  and  N  stand  for  adjective  and  noun,  re¬ 
spectively.  Since,  however,  adjectives  can  form  a  variety  of  other  strings  *9 
the  accumulative  strings  are  assigned  the  following  type  of  bracketing  ({A 
(A  A))  N)  in  order  to  make  the  treatment  of  adjectival  strings  uniform. 

Table  II- 1;  Subclasses  of  Adjectives  Distinguished  in  RG2.c0 


lLong-form  Adjectives  (A) 


Sublasses 

Tags 

Used 

Examples 

Adjectives 

Degree 

Positive 

SC/A 

SINI1  ('blue*) 

proper 

of  com- 

Comparative 

SC/AC 

MEN6WH  ('lesser*) 

parison 

Superlative 

NOVE1WI1  ('newest')  1 

Participles 

Active 

Present 

SC/LAN 

PIWU5I1  ('writing') 

Past 

SC /LAP 

PISAVWIi  ('who  wrote') 

Passive 

Present 

SC/LPN  1 

CITAEMY1  ('being  read’) 

Past 

SC/LPP 

POLUCENNYi  {'received') 

Pronominal  adjectives  (pronouns 

SC/P 

NAW  ('our’}  INOl  ('other') 

i  acting  as  adjectives) 

RG2  rules 

The  existing  rules  require  that  adjectives  agree  in  case  (CP)  and 
number -gender  (NG'<  (5).  At  already  noted  in  Section  2.  1.  3,  the  resultant 
adjectival  phrase  (AP)  requires  the  existance  of  another  rule  (6).  In  both 
(5)  and  (8),  attributes  not  mentioned  in  the  rules  should  be  copied  from  the 
Cg  (BTC/2). 

(5)  A  CP/X  NG/Y 
♦  A  CP/X  NG/Y 
.  AP  CP/X  NG/Y  ETC/2 
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(4*  a  cpfx  Nd/r 

*  AP  CP/X  NG/Y 

»  AP  CP/X  NC/Y  ETC/ 2 

Proposed  rules 

la  line  with  suggestion*  made  in  Section  2.  i.  3,  the  above  rules  can 
be  replaced  by  (7).  The  tag  AAXi.  whose  meaning  is  "the  first  exclusion 
“I  in  adjective >adjective  rules",  is  used  to  enforce  right- branching 
bracketing, 

(7)  A  CP/X  NG/Y  AAXI/* 

*  A  CP/X  NG/Y 

*  A  CP/X  NG/Y  AAX1/PLUS  ETC/2 

Note:  Additional  restrictions  on  the  type  of  adjectives  ad  ttted  to 
this  rule  would  have  to  be  included  at  the  time  of  actual  implementation. 

2.  Cardinal  Numeral  Strings 

The  subclasses  of  numerals  In  RG2,  derived  according  to  the  nature 
of  their  ties  with  noun* are  shown  in  Table  II- 2. 

RG2  rules 

RG2  rules,  which  only  partially  reflect  the  agreement  requirements 
stated  in  Table  H-2,  are  presented  with  a  minimum  of  comment,  since  their 
limitations  will  become  apparent  in  the  course  of  the  discussion  that  follows. 

(8)  C  CP/X 

4  C  CP/X  SC/Y-M 

*  CP  CP/X  SC/Y  ETC/2 

Comments:  If  two  cardinal  numerals  (C)  agree  in  case  (CP)  and  C2  is 
of  a  subclass  (SC)  other  than  M  (million  and  higher),  they  can  be  com¬ 
bined  to  form  a  cardinal  numeral  phrase  (CP)  «hich  is  in  the  same  case 
and  which  acquiree  the  subclass  of  the  rightmost  (C2)  numeral  as  well 
as  all  of  its  other  attributes  not  mentioned  in  the  subrule  (ETC/2). 

(9)  C  CP/X-G,  D.t.L  GC/Y 
4  C  SC/M  CP/Y 

e  CP  SC/M  CF/X  GC/C  ETC/2 

Comments:  Any  cardinal  numeral  whose  case  is  other  than  genitive, 
dative,  instrumental  or  locative  (hence,  accusative  or  nominative)  can 
govern  an  SC/M  numeral  in  the  corresponding  case  (GC/Y  and  CP/Y 
on  C*  and  C2,  respectively).  The  resultant  CP  is  sn  SC/M  numeral, 
acquires  the  case  of  Cj,  and  can  govern  the  genitive  (GC/C). 


Table  H-2:  Subclassificatton  of  Cardinal  Numerals  in  RG2 


Subclass 
of  the 
numeral 

Definition  of 
mnemonics 

Comments  about  numeral-noun 
agreement  and  government^* 

sc/si22 

Numeral  1  and  all 
numerals  ending  in 
-1,  except  11.2* 

Numerals  in  this  category  agree  in  case, 
number  and  gender  with  the  noun  to  which 
they  refer.  E.g.%  SOROK  ODIN  STOL 
('forty-one  tables'). 

SC/S  2 

Numeral  2  and  all 
numerals  ending  in 
-2,  except  12. 

In  the  nominati"e  (and  in  the  accusative  if 
identical  with  the  nominative)  the  numeral 
governs  the  noun  in  genitive  singular  and 
agrees  in  gender  with  the  noun?  nyA 
STOLA  ('two  tables');  DVE  DAMY  ('two 
ladies'). 

SC/S3 

Numeral  3  or  4  and 
all  numerals  ending 
in  -3,  -4,  except 

13  and  14. 

Preceding  comment  applies.  However, 
there  is  no  agreement  in  gender;  5TO 
CETYRE  STOLA  ('104  tables');  STO 
CETYRE  DAMY  ('104  ladies'). 

SC/S5 

Numerals  5-19, 
even  tens  (20,  30, 

.  .  .  ),  and  all  those 
ending  in  -5- -9. 

Preceding  comment  applies.  However, 
the  governed  noun  must  be  in  the  genitive 
plural.  E.g.t  P4T6  STOLOV  ('five 
tables'). 

SC/H 

Even  hundreds  or 
numerals  ending  in 
even  hundreds. 

Preceding  comment  applies.  However,  the 
"nounness"  of  STO  ('hundred')  comes  to  the 
fore  and  in  oblique  cases,  typically  instru¬ 
mental,  a  government  variant  is  possible 
(noun  in  the  genitive  plural)  DVUM4STAMI 
RUBL4MI/RUBLE1  (’200  rubles') 

SC/T 

Even  thousands  or 
numerals  in  even 
thoiuands. 

Since  TYS4CA  ('thousand')  can  be  both  a 
numeral  and  a  numeral  noun,  the  tendency 
noted  immediately  above  is  fully  within 
the  literary  norm*4:  TYS4CE  STOLOV 
('thousand  tables'). 

SC/M 

Even  millions, 
billions,  etc. 

These  numerals  behave  like  nouns  and 
govern  nouns  in  the  genitive  plural*’4: 
MILLION  STOLOV  ('million  tables'). 
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<10)  C  CP/X-N,  A 

4  C  SC/M  CP/X 
e  CP  SC/M  CP/X  ETC/2 

Comment:  This  subrule  states  that  any  cardinal  numeral  whcae  case 
is  other  than  nominative  or  accusative  can  be  combined  with  an  im¬ 
mediately  following  SC/M  numeral  agreeing  with  it  in  cese. 

RG2  contains  a  CP  +  C  =  CP  type  rule  with  three  subrules  having  the  same 
restrictions  as  those  illustrated  in  (8}-(10)  above.  The  left-branching 
bracketing  thus  enforced  (((C  Cn+|)Cm))  could  be  easily  obtained  in  another 
way  if  the  following  additionaf  restrictions  were  introduced  into  (8)~(10): 

(H)  C . 

+  C . CCX1/* 

*  C . CCXl/Pi^US 

Comments:  CCX1  is  a  locally  important  tag  condition  which  can  be 
called  "first  exclusion  tag  in  numeral-phrase  rules".  This  is  tne 
same  approach  employed  in  (7),  where  right-branching  bracketing  is 
enforced. 

Moreover,  if  the  rules  modified  as  shown  in  (11)  were  ordered  as  in  HZ)  and 
(13),  it  would  be  possible  to  eliminate  subrule  (10).  Note  that  some  minor 
adjustments  have  been  made;  deletions  are  shown  in  brackets  and  additions 
are  underlined  to  facilitate  comparison  with  (8)  and  (9). 

(12)  C  CP/X 

+  C  CP/X  {SC/Y-Mj  CCXl/» 

=  C  CP/X  [SC/Yj  CCX1  /PLUS  ETC/2  (S/QUIT  F/(^)) 

(13)  C  CP/X-G,  D.I,  L  GC/Y 

+  C  SC/M,  T  SC /Z  CP/Y  CCXl/» 

*  £  SC/Z  Cp/X  GC/C  CCX1/PLUS  ETC/ 2  (S/QUIT  F/CUIT) 

Comments:  The  S  and  F  "tags"  are  actually  instructions  wh.ch  indi¬ 
cate  which  subruie  the  recognition  routine  should  process  next  v" 
cases  of  "success"  or  "failure"  of  the  given  subrule.  The  "vciue" 
QUIT  indicates  that  the  search  pass  through  the  subruie  packet  should 
be  terminated.  The  "value"  (1 3)of  F  in  (12)  is  meant  to  indicate  that 
subrule  (13)  in  this  section  is  to  be  processed  next  if  subrule  (12)  fails. 
In  actual  practice,  the  value  of  each  aucceee  or  failure  transfer  (S  or 
F),  as  well  as  the  subrule  Identifier  to  which  it  refers,  would  have  to 
be  a  symbol  of  the  tag  metalanguage  and  hence  would  have  to  begin 
w'th  an  alphabetic  character.  The  latter  restriction  will  not  be  ob¬ 
serve**  in  this  presentation.  The  repetition  of  SC  in  the  €2  line  in  (13), 
first  with  a  list  of  acceptable  constants  and  then  with  a  variable,  is  an 
alternate  method  of  enforcing  the  type  of  restriction  illustrated  by  the 
value  exclusions  of  CP  in  the  C{  line  of  the  same  subrule.  Additions 
and  deletion#  in  (12)  and  (13)  involving  items  other  than  CCXi,  3,  and 


F  will  romedy  certain  defects  of  (8)  and  (9).  In  ordar  to  eorroct  other 
limitations  of  die  existing  rules,  a  set  of  rules  similar  to  the  following 
might  be  proposed. 


The  distinctions  expressed  in  Table  II- 2  reflect  only  the  grammatical 
properties  of  numerals.  In  order  to  insure  that  only  lexically  permissible 
sequences  of  numerals  are  combined,  as  well  as  for  some  other  purposes,2** 
it  is  necessary  to  introduce  an  attribute  which  will  be  called  RANK.  The 
values  of  this  attribute  and  the  groups  of  numerals  to  which  they  are  assigned 
are  shown  in  Table  II- 3. 

Table  II-3:  Value  of  the  Attribute  RANK 

Group  A:  numerals  Group_B:  jmmerai  nouns 

Value  of  RANK  Numerals  Value  of  RANK  Numerals 

ONES  1-9  THOUSD  1,000 

TEENS  10-19  MIUON  1,000,000 

TENS  20,  30,...,  90  BILION  1.000,000,000 

HUNDRD  100,  200 .  900 

The  normal iy  permissible  order  of  concatenation  of  numerals  would  require 
♦he  value  of  RANK  on  successive  constituents  to  be  restricted  as  shown  in 
Tabic  tI-4. 

Table  2-4:  Restriction  on  Concatenation  of  Numerals 

The  following  values  ot  RANK  cannot  appear  on  C1 
when  the  value  of  the  same  attr.^ute  on  C2  i*  *• 
shown  in  the  next  column _ _ 

ONES,  TEENS 
ONES,  TEENS,  TENS 
ONES,  TEENS,  TENS 
ONES,  TEENS,  TENS,  HUNDRD 
THOUSD 

THOUSD,  UILION 
THOUSD,  MIUON,  BLLICN 

When  expressed  in  GSA  rule  format,  the  above  restriction?  give  rise  to  the 
packet  of  ordered  subrules  presented  immediately  below  (14.  6).  All 

cf  the  subruies  involve  partial  tests  and  therefore  result  in  a  dummy  const! - 
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tuent  (DUMMY)  which  is  introduced  solely  for  the  purpose  of  satisfying  sub- 
rule  format  requirements.  The  values  of  S  and  F  instructions  refer  to  num¬ 
bers  of  subrules  in  this  text.  In  cases  of  success  (S),  the  search  is  continu¬ 
ed  in  the  next  packet  of  subrules,  starting  with  (15. 1) 

(14.  1)  C  RANK/X-ONES,  TEENS 
+  C  RANK/ONES 
=  DUWMY  (S/ (15.  1)  F/(  1 4.  2)) 

(14.  2)  C  RANK/X-ONES,  TEENS,  TENS 

i  <•  rank/ teens,  tens 

=  DUMMY  (S/(15. I)  F/(14.  3)) 

(14.  3)  C  RANK/ONES,  TEENS,  TENS,  HUNDRD 
+  C  RANK/HUNDRD 
=  DUMMY  (S/(15.  1)  F/(14.  4)) 

(14.4)  C  RANK/X-THOUSD 
+  C  RANK/THOUSD 
=  DUMMY  (S/(15.  1)  F/(14.  5)) 

(14.  5)  C  RANK/X-THOUSD,  MILION 
+  C  RANK /MILION 
=  DUMMY  (S/ (15.  1)  F/(  14.  6) 

(14.  6)  C  RANK/X-THOUSD,  MILION,  BiLION 
+  C  RANK/ BILION 
-r  DUMMY  'S/(15.1)  F/QUIT) 

Subrules  (14.  1 )-( i 4.  5)  deal  only  with  the  problem  of  concatenation.  In  order 
to  describe  the  grammatics!  links  which  must  be  considered  in  addition,  it 
will  be  convenient  first  to  introduce  some  additional  notat.on  characterizing 
classes  of  numeral  strings.  Let  cardinal  numerals  and  combinations  of  nu¬ 
merals  and  numeral  nouns  termed  according  to  the  restrictions  stated  in 
(14.  1)-(14  6)  be  denoted  by  two-character  symbols  beginning  with  "M",with 
MS  representing  a  sincle  numeral  (Group  A  in  Table  II-  5)  and  MN  a  numeral 
noun  (Group  B  in  Table  U-3).  Further,  let  M*. -M4  represent  combinations  of 
numerals  and  numeral  novns  differentiated  according  to  (a)  presence  or  ab 
seuce  of  numeral  noun*  and  (b)  the  relative  position  of  the  numeral  noun, 
factors  which  determine  *h<*  nature  of  the  grammatical  tests  that  must  be 
carried  out  when  checking  the  validity  c-f  each  potential  ;  ombirw'ion.  The 
information  presented  in  Table  11-5  is  expressed  in  an  ordered  set  of  sub- 
rules  (*5.  !  i  - ( i 5.  5)  below.  For  the  sake  of  illustration,  ’.he  "M-symbol"  ap¬ 
pears  at  the  value  of  the  attribute  MTYPC  -  "M-type"  (Although  the  use  of 
this  tag  cou'd  be  avoided  through  employment  of  an  elaborate  system  of  ex¬ 
clusion  tag*,  the  resultant  rules  wc-uld  be  much  more  difficult  to  follow  and 
>«•  tier-cribe.)  The  five  subrules  (15.  i}-(15.  5)  constitute  five  ■  steps"  that 
determine  which  of  four  "tests"  is  necessary  to  complete  the  recognition  of 
a  numeral  string. 


Table  11-5:  Permissible  Combinations  of  M- Constituents 


Hypothetical 

rule 

Test 

Russian  example 

Numeric 

equivalent 

MS*  MS -Mi 

4 

SOROK  TRI 

43 

MS 4  Mi  =  Ml 

1 

STO  SOROK  TRI 

143 

MS  +  MN  =  M2 

2 

TRI  TYS4C1 

3,  000 

Mi  4  MN  =  M2 

2 

SOROK  TRI  TYS4C1 

43.  000 

M2+  MS  =  M3 

4 

TRI  TYS4CI  DVA 

3.  002 

M2  +  Ml  =  M3 

4 

TRI  x  YS4CI  SOROK  TRI 

3.  043 

M2+  M2  =  M3 

4 

DVA  MIL  LION  A  SOROK 
TYS4C 

2,  040,  OOC 

M2  +  M3  =  M3 

4 

DVA  MIL  LIAR  DA  MILLION 
SOROK  TRI  TYS4CI 

2.  001,  043,  000 

M2  +  MN  =  M3 

4 

DVA  MILLIONA  TYS4CA 

2,  00 i,  000 

MN+  MS  =  M  3 

3 

TYS4CA  DVA 

1,  002 

MN+  Mi  =  M3 

3 

TYS4CA  SOROK  TRI 

1,  043 

MN  +  M2  =  M3 

4 

MILLION  SOROK  TRI 

TYS4CI 

1,  043,  000 

MN  +  M3  =  M3 

4 

MILLIARD  ODIN  MILLION 
TYS4CA  SOP  OK  TRI 

i,  001,  OUi,  04  3 

MNl  MN= M3 

4 

MILLION’  TYS4CA 

1.  001,  000 

MN  +  MS  *  M4 

3 

TYS4CI  DVE 

about  2,  000 

MN  f  Mi  =  M4  3 


TYS4CI  SOROK  TRI 


about  43,  000 


Step  1.  If  the  Ci  is  an  MS  and  C<2  is  MS  or  Mi,  go  to  T.;st  1  -  subrulc 
(16.  1);  otherwise  (15.  2), 

(15.  1)  C  MTYPE/MS 

4  C  MTYPE/MS,  Mi 
=  DUMMY  (5/(16.  1}  F/(15,  2)) 

Step  2.  If  the  is  an  MS  or  Ml  and  C£  is  an  MN,  go  to  Test  2  -  subrule 
(16.  2)}  otherwise  (15.  3). 

(15.  2)  C  MTYPE/MS,  Ml 
4  C  MTYPE/MN 
=  DUMMY  (S/(16.  2)  F/{15.  3)) 

Step  3.  Li  the  Cj  is  ar  MN  and  C2  is  an  MS  or  Ml,  go  to  Test  3  -  subrule 
(16.  ?);  otherwise  (15.  4). 

(15.  3)  C  MTYPE/MN 

4  C  MTYPE/MS,  Ml 
=  DUMMY  (S/(16.7)  F/(15.  4)) 

Step  4.  If  the  C j  is  an  M2  and  C2  is  any  M,  except  M4,  go  to  Test  4  - 
subrule  (16.  12);  otherwise  (15.  5). 

(15.  4)  C  MTYPE/M2 

4  C  MTYPE/X-M4 
=  DUMMY  (5/(16.12)  F/(15.  5)) 

Step  5.  If  Ci  is  an  MN  and  C2  is  any  M,  except  M4,  MS,  or  Ml,  go  to 
Teat  4  -  subrule  (16.  12);  otherwise  QUIT. 

(15.5)  C  MTYPE/MN 

4  C  MTYPE/X-M4,  MS,  Ml 
=  DUMMY  (S/(i6.  12)  F/QUIT) 

Teat  1.  Here  only  agreement  in  case  is  required.  The  value  of  RANK  on 
C*  is  copied  onto  the  corresponding  attribute  of  C3  since  this  in¬ 
formation  is  necessary  for  the  tests  in  (14.  1)-(14.  6),  which  will  be 
reapplied  when  attempting  to  form  still  longer  chains.  The  value 
of  RANK  on  C2  is  copied  onto  RANKP.T  -  ’’rank  on  the  right"  in 
order  to  preserve  information  necessary  for  other  rules  which  may 
subsequently  apply. CCX3  is  a  locally  important  tag  which,  like 
other  CCX-prefixed  tags,  is  used  to  insure  a  given  order  of  con¬ 
catenation. 

(16.1)  C  CP /X  RANK/Y  CCX3/* 

4  C  CP/X  RANK/Z 

«  C  MTYPE/M1  CP/X  RANK/Y  RANKRT/Z 
CCX3/PLUS  ETC/2  (S/OUIT  F/QUIT) 


Test  2.  In  (16.2),  provisions  are  made  for  Si  numerals  which,  as  shown  in 
Table  II-2,  must  agree  in  case  (CP),  number  and  gender  (NG)  with 
the  numeral  noun.  In  (16.3),  all  other  numerals  in  oblique  cases  need 
agree  only  in  case.  Subrules  (16.4)-(l6.6) test  for  situations  where 
the  first  numeral  is  in  the  nominative  or  accusative  case  (CP/N,  A) 
and  governs  the  second  numeral  in  the  genitive  case  (CP/G)  in  ac¬ 
cordance  with  the  conditions  stated  in  Table  II- 2.  Hence,  in  (16.4), 
which  applies  to  S2  numerals,  agreement  in  gender  is  required;  in 

(16.5),  agreement  in  gender  is  not  required,  but  the  noun  must  be 
singular;  in  (16.  6),  the  tests  of  (16.5)  are  repeated  with  the  excep¬ 
tion  of  the  number  (NG)  of  0%  which  must  be  plural.  Since  (16.6)  is 
the  last  rule  of  the  particular  subpacket,  further  search  is  termi¬ 
nated  in  instances  c?  either  success  (S/  QUIT)  or  failure  (F/QUIT). 

(16.  2)  C  SC/SI  CP/X  NG/Y  RANK/XA 
+  C  CP/X  NG/Y  RANK/XB 

=  C  MTYPE/M2  CP/X  NG/Y  RANK/XA  RANKRT/XB 
ETC/2  (S/QUIT  r/ (16.  3)) 

(16.3)  C  SC/XC-S1  CP/X-N,  A  CP/Z  RANK/XA 
+  C  CP/ Y-N,  A  CP/Z  RANK/XB 

=  C  MTYPE/M2  CP/Z  RANK/XA  RANKRT/XB  ETC/2 
(S/QUIT  F/(16.  4)) 

(16.4)  C  SC/S2  CP/N, A  CP/X  NG/Y  RANK/XA 
+  C  CP/G  NG/Y-P  RANK/XB 

=  C  MTYPE/M2  CP/X  NG/Y  GC/G  RANK/XA 
RANKRT/XB  ETC/2  (S/QUIT  F/(l6.  5)) 

(16.5)  C  SC/S3  CP/N,  A  CP/S  NG/Y  RANK/XA 
+  C  CP/G  NG/Z-P  RANK/XB 

a  C  MTYF.:/M2  CP/X  NG/Y  GC/G  RANK/XA 
RANKRT/XB  ETC/2  (S/QUIT  F/(l6.  6)) 

(16.6)  C  SC/S5,  H  CP/N,  A  CP/X  NG/Y  RANK/XA 
+  C  CP/G  NG/P  RANK/XB 

a  C  MTYPE/M2  CP/X  NG/Y  GC/G  RANK/XA 
RANKRT/XB  ETC/2  (S/QUIT  F/QUIT) 

Test  3.  The  tests  in  this  packet  of  subrules  are  intended  for  the  recognition 
of  approximate  quantities  shown  in  lines  ''o'  and  "p"  of  Table  II-5. 
The  restrictions  on  the  use  of  inversion  of  normal  numeral-noun 
sequence  to  express  approximation  are  not  clear  and  the  conditions 
imposed  here  are  accordingly  based  purely  on  Sprachgefuhl. 

In  oblique  cases  it  is  necessary  to  permit  both  an  M3  and  an  M4  to 
be  generr  *ed.  Subrules  (16.  ?)  and  (16.  8)  require  case  agreement 
Only. 


{16. 7)  C  CP/Z  RANK/XA 

♦  C  CP/Z  RANK/XB 

=  C  MTYPE/M3  CP/Z  RANK/XA  RANKRT/XB  ETC/2 
(S/(16. 8)  F/(i6.  9)) 

(16.8)  C  CP/X-N, A  CP/Z  RANK/XA 
+  C  CP/Y-N, A  CP/Z  RANK/XB 

a  C  MTYPE/M4  CP/Z  RANK/XA  RANKRT/XB  ETC/2 
(S/QUIT  P/QUIT) 

The  next  three  eubrulea  reverse  the  order  of  C{  and  C2  of  the  cor- 
responding  subrules  (16.4)-(16.6).  The  value  of  MTYPE  is  set  to 
M4. 

(16.  9/  C  CP/G  NG/Y-P  RANK/XA 

+  C  SC/S2  CP/N,  A  CP/X  NG/Y  RANK/XB 
=  C  MTYPE/M4  CP/X  NG/Y  GC/G  RANK/XA 
RANKRT/XB  ETC/1  (S/QUIT  F/(16.  10)) 

(16.10)  C  CP/G  NO/Z-P  RANK/XA 

+  C  SC/S3  CP/N,  A  CF/X  NG/Y  RANK/XB 
*  C  MTYPE/M4  CP/X  NG/Y  GC/G  RANK/XA 
RANKRT/XB  ETC/1  (S/QUIT  F/(16.  11)) 

(16.11)  C  CP/G  NG/P  RANK/XA 

+  C  SC/S5,  H  CP/N,  A  CP/X  NG/Y  RANK/XB 
=  C  MTYPE/M4  CP/X  NG/Y  GC/G  RANK/XA 
RANKRT/XB  ETC/1  (S/QUIT  F/QUIT) 


Test  4.  This  test  repeats  the  requirements  of  (16. 1)  in  Test  1;  however, 
the  value  of  MTYPE  is  set  to  M3  rather  than  Ml. 

(16. 12)  C  CP/X  RANK/Y 

+  C  CP/X  RANK/Z 

s  C  MTYPE/M3  CP/X  RANK/Y  RANKRT/Z  ETC/2 
(S/QUIT  F/QUIT) 

Note:  The  CCX3  is  unimportant  in  (16. 12)  and  is  there 
fore  not  used. 


Comparison  of  RG2  and  proposed  rules 


Although,  as  was  shown  above,  the  six  subrules  for  recognition  of 
cardinal  numerals  contained  in  RC2  could  have  been  reduced  to  two,  (12) 
and  (13),  the  more  comprehensive  treatment  of  the  problem  proposed  here 
would  require  some  twenty-three  subrules.  The  detailed  treatment  of  nu¬ 
meral  strings  in  the  illustratioue  is  itself  incomplete,  since  it  neglects  some 
of  the  problems  which  can  be  caused  by  the  non-standard  usage  of  SC/H  nu¬ 
merals  (hundreds)  and  does  not  fully  consider  the  functions  of  SC/T  numer¬ 
als  (thousands).  However,  these  gaps  appear  to  be  relatively  insignificant. 
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3.  Noun-Noun  Predications 

Two  noun*  in  the  nominative  case  which,  in  addition,  usually  agree 
in  number,  gender,  and  animateness  can  combine  to  form  a  predication  or 
an  apposition.  By  way  of  general  comment,  two  nouns  are  seldom  encoun¬ 
tered  either  without  appropriate  punctuation  (a  dash)  or  additional  words 
such  as  the  negative  particle  NE  when  one  ui  them  acts  as  a  subject  and  the 
other  as  the  corresponding  predicate.^  Thu^,  JFIZIKA- -NAUKA  ('Physics 
is  a  science')  or  $ALXXMI4  NE  NAUKA  (’Alchemy  is  not  a  science’).  It  is 
only  with  shorter  sentences  and  then  in  instances  approaching  colloquial 
usage  that  sentences  like  $M01  BRAT  UCITEL6  ('My  brother  is  a  teacher1) 
are  possible.  Primarily  for  this  reason,  RG2  did  not  cc.itain  a  specific 
N  +  N  subrule  producing  a  predication. 

4.  Close  Appositions 


The  Russian  term  for  appositives  (prilozhenie)  is  a  caique  from  the 
Latin  appositio.  However,  related  etymologies  notwithstanding,  the  respec¬ 
tive  usage  of  the  two  terms  differs  considerably  in  grammars  of  English 
and  of  Russian. The  use  of  the  terms  appositive  and  apposition  in  the 
present  context  is  limited  to  constructions  described  in  the  present  subsec¬ 
tion.^  In  order  to  develop  grammar  rules  for  the  recognition  of  close  ap¬ 
positions,  i.e.,  those  whose  components  are  not  separated  by  punctuation,  a 
detailed  subclassification  of  nouns  is  required.  Only  a  broad  outline  of  such 
a  subclassification  is  presented  at  this  time. 


The  subclassification  proposed  in  Figure  Q-l  is  based  on  require¬ 
ments  of  Russian  grammar;  stylistic  considerations  have  influenced  only  the 
relative  emphasis  on  certain  construction  types.  Examples  illustrating  the 
subclasses  shown  in  Figure  II- 1  and  one  possible  way  in  which  the  relevant 
information  can  be  expressed  in  tag  notation  appear  in  Table  II-6.  The  tags 
introduced  in  Table  II - 6  are  defined  in  Table  II-7. 


;roups  of  close  at 


The  characteristics  of  the  twelve  main  subgroups  within  close  appo¬ 
sitions,  summarized  in  Table  11-8,  are  briefly  commented  on  below.  In 
each  instance,  a  sketch  of  the  linguistic  background  is  followed  by  proposed 
recognition  rules. 


A1  Subgroups  Two  personal  names 

The  only  instance  erf  a  true  personal  name  appositive  construction^ 
is  not  discussed  here.  However,  for  functional  reasons,  it  is  convenient 
to  create  quasi -appositions  consisting  of  personal  names,  Russian  and  for¬ 
eign  names  are  considered  separately  because  of  differences  in  patterns  of 
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Table  11-7:  Attribute  a  and  Values  of  Tags  Used  in  Table  g-6 


Attribute  Posetbie  values 


Name 

Meaning 

(meaning  of  each  value  follows  in  parentheses) 

ANXMTE 

Animate 

ANIMAL  (animal),  HUMAN  (human) 

FOREGN 

Foreign 

PLUS  (plus) 

GENUS 

Genus 

ANIMAL  (animal).  BO  TAN  (botanical), 

OTHER  (other) 

GROUPA 

Close 

Apposition 

PLUS  (plus),  MINUS  (minus) 

INANIM 

Inanimate 

BOTAN  (botanical),  GECGR  (geographical), 
NOMEN  (nomenclature  Hem),  OTHER  (other) 

NAME 

Name 

FIRST  (flret  name),  LAST  (last  name), 

PATRO  (patronymic) 

N CLASS 

Noun  class 

COMMON  (common  noun),  EPITH  (epithet), 
NOMEN  (nomenclature  item),  PROPER  (proper 
name) 

RU51AN 

Russian 

PLUS  (plus) 

SPC3ES 

Species 

ANIMAL  (animal),  BOTAN  (botanical). 

OTHER  (other) 

TESTA 

Test  A 

ADJREL  (adjective  -  related),  NONADJ  (not 
a  dje  cti  ve  -  r  ela  ted) 
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Table  11-8:  N©;«#  to  column  "other  tags" 

Cl)  See  tf»#cu«B*on  of  Al  subgroup. 

(2)  The  first  noun  must  have  the  attribute  GROUPA. 

(3)  The  first  noun  muit  have  the  tag  TESTA/ADJREL. 

(4)  The  firet  noun  must  have  the  tag  TESTA /NONA DJ. 

{5)  The  first  noun  must  have  the  attribute  GENUS  and  the  second  noun 

the  tag  5PCZES,  both  with  the  value  of  the  ANIMTE  or  INANIM.  at¬ 
tribute,  as  the  case  may  be 

agreement. 

Russian  personal  names 

First  names  (NAME/FIRST),  patronymics  (NAME/PATRO)  and 
last  names  (NAME/LAST)  have  been  identified  in  Figure  II - 1  and 
Table  IX-6.  In  addition,  a  constituent  consisting  of  a  first  name  and 
a  patronymic  (NAME/FPATRO)  must  also  be  considered.  If  $IVAN 
('Ivan'),  $lVANOVIC  ('Ivanovich'),  and  $PETROV  ('Petrov')  are  taken 
as  representative  examples,  the  following  five  constructions  are 
possible. 

a.  First  name  and  patronymic  --  $IYAN  $IV>  NOVIC 

b.  First  and  last  name  --  $IVAN  JPETROV 

c.  First  name,  patrunymic,  last  name  -  J7VAN  flVANOVIC  $PETROV 

d.  Lant  name,  first  name  --  $PETROV  JIVAN 

e.  Last  name,  first  name,  patronym:;  --  $PETFOV  SIVAN  $IVANOVIC 
AH  items  in  a-e  above  must  agree  in  case,  number,  and  gender 


Foreign  personal  names 


Only  constructions  g  and  b  below  will  be  discussed  here. 

g.  Several  first  names  agreeing  in  care  followed  by  a  last  name  in  the 
name  case.  For  instance.  S  FRANKLIN  ^ROBERT  $RICARDSON 
('Frankiir  Robert  Richardson1). 

h.  One  or  more  firs*  names  agreeing  ia  case  followed  by  a  foreign 
feminine  la#;  name  (which  is  identical  to  the  nominative  mascu¬ 
line  form'.  For  example,  $3LEAN0RA  $RUZVELbT  (’Lieancr 
Roosevelt1). 


->0 


1.  The  version  of  both  constructions  where  the  last  name  occurs  first 
is  possible  in  principle,  but  is  not  considered  here  despite  the  fact 
that  some  constructions  could  be  recognized  by  subrule  (24)  below. 

Proposed  rules  for  A1  subgroup 

Step  1.  It  is  necessary  to  establish  that  the  two  nouns  (N)  one  is  to  deal 
with  in  this  packet  are  both  proper  names  (NCLASS/PROPER) 
denoting  human  beings  (ANIMTE/HUMAN). 

(17)  N  NCLASS/PROPER  ANIMTE/HUMAN 

+  N  NCLASS/PROPER  ANIMTL  /HUMAN 
=  DUMMY  (S/(18)  F/(24)) 

Step  2.  In  the  next  two  subruies,  agreement  tests  are  performed. 

(18)  N  CF/X  NG/Y 

+  N  CP /X  NG/Y 
=  DUMMY  (S/{20)  F/(19)) 

Comment:  Lf  the  two  noun?  agree  in  case  (CP)  and 
number -gender  (NG),  go  to  subrule  (20);  otherwise 

(19). 

(19)  N  CP  /X  NG/F  NAME /FIRST 

+  N  CP/N  NG/F  NAME/LAST  FOR EGN/ PLUS  NNX1/* 

=  N  CP /X  NG/F  NAME/LAST  FOREGN/PLUS 
NNX1  /FT'^ST  ETC/ 2  (S/QUIT  F/QUIT) 

Comment:  This  subrule  is  designed  to  handle  (g)  and  (h) 
above.  Any  feminine  (NG/F)  first  name  (NAME/FIRST) 
in  any  case  (CP/X)  can  combine  with  a  foregin  (FOREGN/ 
PLUS)  feminine  (NG-/V)  last  name  (NAME/LAST)  .ich  is 
in  the  nominative  (CP/N)  and  does  not  have  trie  t*g  NNXi  / 
FIRbT.  The  is  a  foreign  feminine  last  name  in  the  same 
case  as  the  C<  ;  all  other  tags  not  mentioned  in  the  suoruie 
are  copied  from  the  (ETC/2).  NNXI  is  a  locally  im¬ 
portant  tag  (the  first  exclusion  tag  in  nour-noun  rules) 
which  enforces  left-branching  bracketing. 

Step  3.  Next,  proper  sequence  of  first  names,  patronymics,  last  names, 
and  their  combinations  is  enforced  by  subrules  (20)-(23). 

(20)  N  NAME/ FIRST  NNXI/* 

+  N  NAME  /FIRST 

«  N  NAME/FIRST  NNX1/FIRST  ETC/2  , S/QUIT  F/(2i)) 

Comment:  Subrule  normally  applies  only  to  foreign  names. 
Any  number  of  first  names  (g)  can  be  combined  through 
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repeated  application*  of  (20),  resulting  in  a  first  same 
which  has  the  tag  NNXi  / FIRST.  Since  there  are  no  re  * 
•frictions  regarding  NNXI  on  Cj  in  (19),  that  subrule 
can  then  be  used  to  recognix*  varieties  of  (g)  ar.d  (h) 
above. 

(21)  N  NAME/FIRST  NNXi/* 

+  N  NAME/PATRO 

*  N  NAME/ FPATRO  ETC/ 2  (S/QUIT  F/(22)) 

Comment:  Since  a  patronymic  (NAME/PATRO)  can  rnly 
combine  with  a  single  first  name,  output  of  (20)  is  ex¬ 
cluded  (NTIX1/*  on  Cf ).  This  subrule  is  designed  to 
handle  (a)  above. 

(22)  N  NAME/FIRST,  FPATRO  NAME/X 
+  N  NAME /LAST  NNXI/* 

X  N  NAME/LAST  NNX1/X  ETC/2  (S/QUIT  F'{23)) 

Comment:  This  subrule  is  similar  to  (19)  and  is  intended 
to  recognize  (b),  (c),  and  (g).  Any  first  name  or  (fir.it 
name)- patronymic  construction  can  combine  with  a  last 
name  not  produced  by  (19),  (22),  or  (23),  The  tag  NAME 
is  repeated  on  in  order  to  copy  its  value  onto  NNXi  on 
C3. 

(23)  N  NAME/LAST  NNXI/* 

4  N  NAME/FIRST,  FPATRO  NAMS/X 
=  N  NAME/LAST  NNXl/X  ETC/2  (S/QUIT  F/QUIT) 

Comment:  The  C1-C2  sequence  of  (22)  is  reversed  here. 
C3  differs  in  the  value  of  the  F  instruction  because  this  la 
tli*  last  eubiule  of  the  packet.  Items  in  (d)  and  (e)  are 
recognized  by  this  rule,  as  are  items  in  (i)  corresponding 
to  (g),  or  to  instances  of  (h)  where  all  constituents  are  in 
the  nominative  case. 


A?,  and  A3  Subgroups:  Common  noun  -  preper  name,  both  animate 

The  common  noun  is  usually  a  "title’1  describing  a  human  being  Dy 
1  eference  to  profession,  social,  pcl’tical,  military  and  other  ranks,  na¬ 
tional  or  regional  origin,  rsonal  characteristics  or  qualities,  and  the  like 
(lines  7-9  of  Table  II- 6),  Any  personal  name  in  lines  2-6  of  Table  II- 6  or 
A1  subgroup  quasi-appoeition  can  be  used  as  the  second  component.  Vari¬ 
eties  of  agreement  are  affected  by  the  pecuHarities  of  the  proper  name. 
Instances  where  A2-appositions  contain  several  "titles"  ar^  not  discussed?^ 
the  same  applies  tc  instances  where  is  itself  an  apposition  ( Cf.  note  29, 
example  (6)). 
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Propo ssd  ruls*  for  A2  and  A3  subgroups 


The  packet  of  subrales  described  below  contain*  the  tubrules  neces- 
Miy  lor  the  recognition  of  constructions  in  the  A2  and  A3  subgroup*. 

’'tap  1.  Before  establishing  the  necessary  agreement  requirements  in 

Step  2,  it  is  necessary  to  ascertain  that  the  pair  of  nouns  meets 
the  broad  requirements  of  the  subrule  packet  (next  three  sub- 
rules). 

(24)  N  NCLASS/ COMMON  GROUPA/PLUS 
+  N  NCLASS/  PROPER 

«  DUMMY  (S/( 25)  F/(35)) 

Comment:  II  is  a  common  noun  (NCLASS/ COMMON) 
and  can  be  a  member  of  close  (Group  A)  appositions 
(GROUPA/PLUS)  and  if  C2  is  a  proper  name  (NCLASS/ 
PROPER),  go  to  (25);  otherwise,  go  to  (35). 

(25)  N  ANIMTE/HUMAN 

+  N  ANIMTE/HUMAN 
*  DUMMY  (S/(27)  F/{26)) 

Comment:  If  both  nouns  denote  human  beings  (ANiiMTE/ 
HUMAN)  and  hence  may  belong  to  subgroup  A2,  go  to 
subruie  (27);  otherwise,  go  to  (26). 

(26)  N  ANIMTE/ANIMAL 
+  N  ANIMTE/X 

.  DUMMY  (S / (30)  27(31)) 

Comment:  If  the  Cj  is  an  "animal"  (ANIMTE/ANIMAL), 

C2  can  be  either  animal  or  human  (ANIMTE/X).  If  (26) 
is  successful,  indicating  potential  membership  in  sub¬ 
group  A3,  subruie  (30)  should  be  accessed;  otherwise  go 
to  (31). 

Step  ?.  The  next  four  subrules  establish  agreement  requirements. 

(27)  N  C P/X  NG/Y  APPOS/* 

+  N  CP/X  NG/Y 

«  N  CP/X  NG/Y  APPOS/A2  ETC/i  (S/(47)  F/(28)) 

Comment:  This  subrule  imposes  the  same  agreement  re¬ 
quirements  as  does  (18).  In  order  to  restrict  C^j  to  a 
simple  common  noun,  tne  tag  APPOS/*  is  employed.  Note 
that  although  an  apposition  is  create  i,  it  is  necessary  to 
test  for  a  possible  second  analysis  where  C4  governs  C2. 
For  example,  DRUG  SESTRY  $MARII  can  have  two  snaly- 
ses  corresponding  to  (a)  'the  friend  of  sister  Mary'  and 
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(b) 'the  friend  of  Mary's  sister'.  Hence,  (S/(47),  F/{28)). 
(Although  such  possibilities  of  government  may  also  exist 
for  other  subgroups  of  appositions,  they  are  not  explicitly 
indicated  in  the  transfer  sections  of  subrules  for  subgroups 
A4-A12.) 

(28)  N  CP/X  NG/M  APPOS/* 

♦  N  CP/X  NG/F  RUSIAN/riUS 

s  N  CP/X  NG/M,  F  APFOS/A2  ETC/1  (S/(47)  F/(29)) 

Comment:  This  flubrule  differs  from  the  preceding  one  in 
allowing  Cj  to  be  masculine  (NG/M)  and  Cg  to  be  feminine 
(NG/F).**  However,  C£  (the  proper  name  in  this  instance) 
must  be  a  Russian  personal  name  (RUSIAN/PLUS).  (Cf. 
the  example  INJENER  JL4POVA,) 

(29)  N  CP/X  NG/Y-P  APPOS/* 

+  N  CP/N  NG/F  FOREGN/PLUS 

=  N  CP/X  NG/Y,  F  APPOS/A2  ETC/1  {S/QUIT  F/(30)) 

Comment:  This  subrule  is  similar  to  (19)  in  the  type  of 
agreement  requiremeuts.  Subject  to  (24),  it  allows  a 
foreign  feminine  name  in  the  nominative  case  to  combine 
with  a  noun  in  any  case  (CP/X)  and  of  any  gender  agreeing 
with  it  in  number  (NG/Y-P).  C3  acquires  the  case,  the 
number-gender,  and  all  other  attributes  of  Cj.  However, 
in  addition  to  all  other  possible  values  of  NG  on  C3  it  also 
will  have  feminine  (F). 

(29)  N  CP/X  NG/Y-P  ANIMTE/ANIMAL  APPOS/* 

+  N  CP/X  NG/Z-P 

=  N  CP/X  NG/Y  ANIMTE/ANIMAL  APPOS/ A3  ETC/1 
(S/(47)  F/(47)) 

Comment:  This  subrule  is  restricted  in  its  application  to 
names  given  to  animals,  which  may  (on  rare  occasions) 
disagree  in  gender  with  the  common  noun. 


A4  Subgroup:  Geographical  designations 

The  two  options  of  agreement  between  nouns  in  this  construction*5 
are  generally  inadequately  distinguished  according  to  differences  in  usage 
in  Russian  grammar  and  are  here  considered  interchangeable.  The  features 
shown  for  constituents  in  this  subgroup  of  appositions  are  insufficient  to 
avoid  the  problems  mentioned  in  note  39:  a  sentence  like  ZA  GOROl 
$VOLGA  STANOVITS4  XOLODNEE  can  have  two  analyses  corresponding  to 

Volga  becomes  cooler*  and  (b)  'Beyond  Mt. 


(a)  'Beyond  the  mountain,  the 
Volga  it  becomes  cooler'.*® 
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Proposed  rule  a 


The  packet  of  subrules  required  for  recognition  of  A4  appositions 
wUI  be  described  together  with  those  for  the  A5  subgroup. 

A5  Subgroup:  Botanical  names 

Capitalisation  of  names  of  botanical  species  is  restricted  to  special¬ 
ised  texts, with  the  constructions  appearing  in  the  A8  subgroup  more  com¬ 
monly  use  '  instead.  C£  in  A5  appositions  is  always  in  the  nominative. 

Proposed  rules  for  the  A4  and  A5  subgroups 

The  first  subrule  of  the  present  packet  is  accessed  following  failure 
of  (26). 

(31)  N  INANIM / BOTAN,  GEOGR  INANIM/X 

+  N  INANIM/ BOTAN,  GEOGR  INANIM/X 
=  DUMMY  (S/(32)  F/QUIT) 

Comment:  Both  nouns  must  be  either  botanical  or  geographic  as 
shown  by  the  repetition  of  the  INANIM  -  "inanimate"  tag. 

(32)  N  INANIM/BOTAN  APPOS/* 

+  N 

*  DUMMY  (S/(34)  F/(33)) 

Comment:  Establish  that  both  nouns  are  "botanical". 

(33)  N  CP/X  NG/Y-P 

+  N  CP/X  NG/Z-P 

=  N  CP/X  NG/Y  APPOS/A4  ETC/l  (S/QUIT  F/(34)) 

Comment:  Handles  pairs  of  geographic  nouns.  Agreement  in  case 
is  required.  Number  must  not  be  plural.*® 

(34)  N  CP/X 

+  N  CP/N 

=  N  CP/X  APPOS/A5  ETC/ 2  (S/QUIT  F/QUIT) 

Comment:  The  agreement  restrictions  are  similar  to  those  in  (29). 

A6  Subgroup:  Two  common  nouns  (human  beings) 

The  recognition  of  the  two  types  of  constructions  in  this  subgroup 
would  require  the  following  subrules. 

(35)  N  NCLAS3/ COMMON  GROUPA/PLUS 
+  N  NCLASS/ COMMON  GROUPA/PLUS 
=  DUMMY  (S/(36)  F/(45)) 


Comment;  IS  the  two  nouns  ere  common  nouns  which  cen  be  members 
of  Group  A  appositions  (GROUPA/PLUS),  go  to  (36);  otherwise,  go  to 
(49). 

(36)  N  ANIMTE/HUMAN 

♦  N  ANIMTE/ HUMAN 

«  DUMMY  (S/(37)  F/(42)) 

Comment:  If  both  nouns  are  animate  and  human,  go  to  (37);  otherwise 
go  to  (42). 

(37)  N  TESTA/ADJREL 
+  N 

■  DUMMY  (S/ (39)  F/(38)) 

Comment:  If  Cj  is  of  the  type  STARIK  (line  7,  Table  II-6),  go  to  (39); 
otherwise,  go  to  (38). 

(38)  N  TESTA/NONADJ 
+  N 

=  DUMMY  (S/(40)  F/QUIT) 

Comment:  If  C*  is  of  the  type  TOVARI5  (line  8,  Table  II- 6),  to  to  (40); 
otherwise,  QUIT. 

(39)  N  CP/X  NG/Y 

+  N  CP/X  NG/Y 

=  N  CP/X  NG/Y  APPOS/A6  ETC/2  (S/QUIT  F/QUIT) 

Comment:  This  subrule  requires  complete  agreement  in  case  (CP), 
number,  and  gender  (NG).  It  could  have  been  combined  with  (37)  into 
a  single  subrule. 

(40)  N  CP/X  NG/Y 

+  N  CP/X  NG/Y 

*  N  CP/X  NG/Y  APPOS/A6  ETC/2  (S/QUIT  F/(41)) 

Comment:  This  subrule  is  identical  to  (39)  except  for  the  fact  that  in 
case  of  failure  another  subrule  (41)  should  be  accessed. 

(41)  N  CP/X  NG/Y-P 

+  N  CP/X  NG/Z-P 

=  N  CP/X  NG/Y,  Z  APPOS/A6  ETC/2  (S/QUIT  F/QUIT) 

Comment:  This  subrule  is  similar  to  (40);  gender  agreement  is  re¬ 
laxed  in  order  to  allow  such  constructions  as  TOVARI5  MEDSESTRA 
('comrade  nurse*). 

A7-A9  Subgroups:  Two  common  nouns  (other  than  human  beings) 

The  subrules  for  the  three  subgroups  can  be  considered  collectively 
and  are  accessed  following  failure  of  (36). 
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(42)  N  ANIMTE /ANIMAL  GENUS/ X 

4  N  ANXM TE/ ANIMAL  SPCIES/X 

*  DUMMY  (S/(44)  JT/(43)) 

Comment:  If  both  nouna  are  ANIMTE /ANIMAL  and  C\  denotes  a 
genus,  C2  a  species,  go  to  (44);  otherwise,  go  to  (43). 

(43)  N  INANIM/BOTAN,  OTHER  INANIM/X  GENUS /Y 

4  N  INANIM/BOTAN,  OTHER  INANIM/X  SPCIES/ Y 

*  DUMMY  (S/(44)  F/OUIT) 

Comment:  If  both  nouns  agree  in  the  value  of  the  attribute  INANIM 
(inanimate)  and  Cj  denotes  a  genus,  while  Cz  denotes  a  species,  go 
to  (44);  otherwise,  QUIT. 

(44)  N  CP/X 

4  N  CP/X 

=  N  CP/X  ETC/1  (S/QUIT  F/QUIT) 

Comment:  This  subrule  could  have  been  incorporated  into  (42)  and 
(43).  Agreement  in  case  is  required.  In  order  to  require  agreement 
in  number,  another  subrule  would  have  to  be  added.  One  subrule 
would  have  to  have  the  values  of  NG  set  to  P,  the  other  to  X-P  on 
both  the  Ci  and  of  the  respective  rules. 

A10  and  All  Subgroups:  Constructions  with  an  epithet 

Since  these  constructions  are  rare,  only  one  subrule  is  given  as  an 
illustration. 

(45)  N  NCLASS/ COMMON,  PROPER  NCLASS/X  ANIMTE/Y 
4  N  NCLASS / EPITH  ANIMTE/Y 

=  N  NCLASS/X  ANIMTE/Y  ETC/1  (S/QUIT  F/(46)) 

Comment:  If  the  Cj  Is  either  a  proper  name  or  a  common  noun  and  C 
is  an  epithet  agreeing  with  it  in  the  type  of  animateness  (ANIMTE/Y), 
a  new  constituent  is  produced  which  is  similar  to  Cj.  Otherwise,  go 
to  (46). 

A12  Subgroup:  Constructions  with  nomenclature  iterns^ 

Although  these  are  very  frequently  encountered  constructions,  ade¬ 
quate  restrictions  are  very  difficult  to  work  out.^  The  subrule  follows, 

(46)  N  NCLASS/ COMMON  INANIM/X 
4  N  NCLASS/NOMEN 

«  N  N CLASS / COMMON  INANIM/X  APPOS/A12  ETC/1 
(S/QUIT  F/QUIT) 

Comment:  If  an  inanimate  common  noun  is  left-adjacent  to  a  nomen¬ 
clature  item,  the  resultant  constituent  is  given  the  same  tags  as  the 
common  noun. 


if 
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of  RG2  tad  proposed  rules 
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f*  'i  In  vww  of  Um  detailed  treatment  of  close  appositions  presented  above, 
ft  la  necessary  to  mention  only  briefly  the  relevant  RG2  rules. 

(1)  All  nouns  were  divided  into  three  groups: 

(a)  "titles"  --  words  of  the  typ«.  INJENER  (’engineer*)  (lines  7-9  of 
Table  U-6)  and  also  those  like  GOROD  (line  20,  Table  II-6) 


*•  (b)  "names*'  —  proper  names 
(c)  all  other  "regular"  nouns. 

(2)  The  only  constructions  considered  were  those  presented  in  the  Al,  A2, 
and  A4  subgroups  shown  in  Table  U-8. 

(3)  Generally  speaking,  agreement  requirements  were  not  ae  well  devel¬ 
oped  as  they  are  in  the  proposed  rules. 


5.  Word  Groups  Formed  by  a  Governing  Noun 

Government  rules  are  basically  simple.  As  shown  in  (47).  the  attri¬ 
butes  GC  (government)  on  Cf  and  CP  (case)  on  C£  must  be  set  to  the  same 
value. 

(47)  N  CP/Y  GC/X 
+  N  CP/X 

=  N  CP/Y  GC/i-X  ETC/1 

Comment:  The  value  of  CP  on  C^  is  set  to  another  variable  in  order 
to  copy  the  value  in  CP  onto  Cy,  the  value  of  "1-X”  of  GC  on  C3 
should  be  interpreted  as  follows;  "Copy  all  values  of  GC  on  C3 
except  X.  as  defined  in  the  subrule." 

Preposition  of  the  governed  noun  is  rare:  in  expository  writing  it 
only  occurs  in  fixed  expressions  of  the  type  GVAROU  LEI  TENANT  ('guards 
lieutenant').  In  RG2.  the  output  of  a  subrule  analogous  to  (47)  resulted  in  a 
constituent  NP  -  "noun  phrase"  which  further  increased  the  number  oi  sub¬ 
sequent  rules. 

6.  Nouns  With  Cardinal  Numerals 

The  (cardinal  numeral)~noun  constructions  parallel  those  involving 
numeral  nouns  (Part  A2  of  this  section),  and  will  consequently  not  be  dis¬ 
cussed  further  here.  With  obvious  adjustments  in  part-of- speech  classes, 
aubrules  (16.  1)-(16.  6)  and  (16.  8}-(l6.  il)  can  serve  as  a  model  for  the  rules 

needed. 

A  sim4?»r  situation  holds  for  the  treatment  of  these  constructions  in 
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RG2,  wher*  ths  C3  constituent  erf  such  miss  result*  in  «a  BP  ("numeral 
noun  phrase"). 

7.  Adjective  -Noun  Constructions 

Three  possibilities  must  be  considered:  (a)  preposed  adjectives 
governing  a  noun,  (b)  preposed  or  postposed  adjectives  agreeing  with  a 
noun,  and  (c)  predications  consisting  of  nouns  and  postposed  adjectives, 
both  in  the  nominative  case. 

RG2  rules 

RG2  rules  cover  all  three  possibilities. 

a.  Adjectival  government.  By  a  rule  analogous  to  (47),  an  adjective  gov¬ 
erning  a  noun  results  in  an  adjective.  Only  a  simple  adjective  is  al¬ 
lowed  to  govern.^* 

(48)  A  CP/Y  GC/X 

+  n  cr/x 

=  A  CP/Y  GC/l-X  ETC/1 

Since  "noun  phrases"  (NP).  "numeral  noun  phrases"  (RP),  "ad¬ 
jective-noun  phrases"  (ANP)  (discussed  immediately  below)  and  some 
other  noun -equivalent  constituents^  are  produced  by  RG2  rules, 
however,  the  actual  number  of  subrules  of  RG2  repeating  the  condi¬ 
tions  in  (48)  is  considerable. 

Postposition  of  governing  adjectives  wai  not  considered  because 
it  is  generally  avoided  in  modern  expository  writing, 

b.  Adjective -noun  agreement.  Agreement  of  preposed  adjectives  was  en¬ 
forced  by  rules  similar  to  those  discussed  for  certain  cardinal  numeral 
constructions  and  appositions. 

(49)  A  CP/X  NG/Y 

+  N  CP/X  NG/Y 

=  ANP  CP/X  NG/Y  ETC/ 2 

Comment:  An  adjective  agreeing  with  a  noun  in  case,  number, 

and  gender  results  in  an  adjective-noun  phrase  (ANP). 

Since  accumulative  adjective  strings  resulted  in  the  constituent 
AP,  an  AP+  N  rule  repeating  the  requirements  of  (49)  was  created. 
Redundant  analyses  were  avoided  by  not  allowing  "noun  phrases"  (NP 
--  where  a  noun  governs  another  noun),  adjective-noun  phrases  (ANP), 
and  similar  noun-d^mlnated  constituents  to  combine  with  preposed  A 
or  AP  constituents. 


A  rule  for  postponed  adjective*  agreeing  with  a  noun  is  aleo  present 
in  JRC2.  However,  each  a  rule  ie  ^«*etined  to  create  difficulties  in  anal¬ 
yses  unless  it  is  rigidly  restricted.  For  instance,  in  a  string  like 
GOSPLANOV  SOHZNYX  RESPUBLIK  {'state  pUnning  commissions  of 
union  republics'),  the  bracketing  ((GOSPLANOV  SOHZNYX)  RESPUB¬ 
LIK)  is  incorrect  and  a  rule  prohibiting,  for  instance,  a  noun  phrase 
containing  a  right-adjacent  adjective  to  govern  a  noun  in  the  same  case 
may  be  a  reasonable  palliative.  However,  this  question  requires  ad¬ 
ditional  study  along  with  a  related  question  concerning  the  ability  of  a 
noun  to  be  simultaneously  modified  by  adjectives  on  both  sides. 

c.  Noun-adjective  predications.  The  ability  of  postposed  adjectives  to 
form  a  zero-copula  predicate  when  the  adjective  agrees  with  a  noun 
is  common  in  all  genres  of  Russian.  The  following  RG2  rule  is  typi¬ 
cal  of  those  developed  for  such  constructions. 

(50)  N  CP/N  NG/X 

+  A  SC/Y-P  CP/N  NG/X 
=  PRED  CP/N  NG/X  ETC/1 

Comment:  A  noun  in  the  nominative  agreeing  with  any  postposed 
adjective,  other  than  a  pronominal  adjective  (SC/Y-P,  Table  II- 1), 
results  in  a  predication  (PRED).  Pronominal  adjectives  are  ex¬ 
cluded  because  they  typically  require  a  dash  between  them  and  the 
noun.  A  similar  rule  was  provided  for  the  ANP  (adjective -noun 
phrase). 

Proposed  rules 

For  the  simple  instance#  of  adjective-noun  constructions  described 
above,  the  RG2  rules  are  basically  adequate  but  may  require  minor  adjust¬ 
ments.  The  more  serious  problems  of  adjective -noun  agreement  arise  when 
one  or  both  of  the»e  constituents  is  coordinative54  or  when,  as  described 
next,  agreement  is  affected  by  the  presence  of  cardinal  numerals. 

8.  Nouns  with  Cardinal  Numerals  and  Adjectives 

Table  II-9  summarizes  the  recommended  usage'’5,  in  instances  in¬ 
volving  numerals  in  the  nominative  and  inanimate  accusative  case.  In  all 
other  cases,  adjectives,  numerals,  and  nouns  agree  in  case  and  number. 

RG2  rules 

In  RC2,  combinations  of  adjectives  (A)  and  cardinal  numerals  (C) 
result  in  the  ACP  (adjective  numeral  phrase)  ..onstituent,  which  ts  produced 
by  both  the  A  ♦  C  =  ACP  and  C  +  A  *  ACP  rules.  The  ACP  constituent  acquires 
the  subclass  of  the  numeral  and  all  other  tag*  of  the  adjective.  The  tests  in 
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each  instance  are  summarised  in  Table  11*10. 

Table  12-9:  Effect  of  Cardinal  Numerals  in  Nominative  and  Accusative 
on  Adjective -Noun  Agreement 


Case  of  Adjective  (A)  as  Affected  by  Position 
Relative  to  Numeral  (C)  and  Noun  (N) 

Initial  Position  Medial  Position  Final  Position 


Cardinal 

(A  C  N) 

AC  A  N) 

(C  N, 

A,) 

Numeral 

Subclass 

Case  of 
Noun 

Gender  cf  Noun 
feminine  other 

Gender  of  Noun 
feminine  other 

SC/SI 

noun  and 
numeral 
in  full  a- 
greement 

(non-standard) 

agrees  in 
full  with  nouns 

agrees  in 
full  with  nouns 

SC/S2 

SC/S3 

genitive 

singular 

nominative 

plural 

gen.  or  gen. 

nom.  pi.  pi. 

nom.  pi. 

gen. 

or 

nom. 

rl 

all  others 

gen.  pi. 

nom.  pi. 

gen.  pi.  gen. 

pi. 

gen.  pi. 

gen. 

pi. 

Table  TI-IO: 

Summary  of  Tes 

t  Conditions  in  RG2  Rules  for 

Cardinal-Numeral-Adjective  Constituents  (AGP) 

Case  of  Adjective  (A) 

when  Numeral  (C)  is 

SC/X-S1 

SC/ 

Si 

accusative 

or  nominative 

other 

Agrees  in  cate,  num¬ 
ber,  and  gender. 

Agree*  in 
governs. 

case  or  numeral 

Agrees  in 

case. 

The  subrules  for  constructions  involving  AGP  and  noun  constituents 
did  not  provide  for  the  nominative  plural  alternatives  shown  in  Table  11-9. 

Proposed  rules 

« 

Possible  proposed  rules  are  not  provided  in  ,  iew  of  the  fact  that  in 
order  to  present  a  comprehensive  packet,  a  number  of  agreement  rules 
would  have  to  be  stated  which  differ  iittle  in  structural  respects  from  many 
of  the  other  agreement  rules  discussed  so  far. 
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9.  PrepoHtitJti-Neiin  Phrase# 

In  RC2t  a  number  cf  simplifying  assumption#  had  to  be  made  re¬ 
garding  the  function  of  preposition-noun  phrases.  Generally,  a  preposition- 
noun  phrase  immediately  following  a  simple  adjective,  adverb,  or  noun  was 
linked  to  that  constituent.  Verbs  and  predications  were  allowed  to  combine 
with  preposition-noun  pV  ages  on  their  left  and  right.  In  instances  where  a 
preposition-noun  pnrase  occurred  between  some  other  potential  governor 
ana  a  verb  or  a  predication,  it  was  linked  to  the  latter.  Such  necessary  but 
arbitrary  assumptions  led  to  obvious  consequences. 

A  study  of  preposition-noun  phrase-  ‘"as  undertaken  during  the  con¬ 
tract  period.  Information  contained  in  the  relevant  sections  of  the  Academy 
Grammar  and  in  other  sources^  was  coded  on  special  forms  and  partially 
classified.  Concordances  of  some  of  the  more  frequent  prepositions  were 
also  studied. 

RG2  rules 

RG2  rules  for  the  recognition  of  preposition-noun  phrases  (PNP) 
were  of  the  form 

(51)  P  GC/X  W/Y 

4  N  CP/X  W/Z 

=  PNP  SC/Y  GC/X  W/Z  ETC/2 

Comment:  A  preposition  (P)  governing  a  noun  (GC/X  and  CP/X  on 
and  C^.  respectively)  together  with  that  noun  produces:  a  preposition- 
noun  phrase  (PNP)  whose  subclass  (SC)  is  the  actual  preposition  (W/Y 
and  SC/Y  on  Cj  and  C?  respectively)  and  whose  "government''  (GC) 
is  tha:  on  the  preposition.  Since  the  attribute  W  is  mentioned  in  the 
rule,  the  ETC  instruction  cannot  copy  it.  Hence,  W  / Z  on  C^  is  used 
to  copy  the  value  of  W  onio  C3. 

PNP  constituents  do  not  actually  govern,  but  the  attribute  (GC)  was  used  for 
another  purpose:  In  the  rule  N  PNP  =  NPNP,  the  government  of  the  NFf’P 
(noun-prepositional  noun  phrase)  constituent  was  taken  to  be  that  of  the  noun 
less  the  "government"  of  the  PNP.  thereby  providing  for  government  acrc»» 
the  PNP  in  a  number  of  instances.  For  example,  PRILdD  V  DEREVNH 
OTQA  would  be  orrertly  assigned  two  interpretations:  (a)  ’father's  arrival 
in  the  village'  an  (b)  'arrival  n  father's  village’-  However,  many  counter¬ 
examples  can  be  readily  found  where  this  rule  turns  out  either  *0  be  too 
permissPe  or  too  restrictive. 

Several  roup  phrases  were  chained  together  into  a  PNP-'  block" 
(PNPB)  by  rules  of  the  form  PNP  ♦  PNP  r  PNPB  ana  PNP  -  PNP 3  =  PNPL. 
The  mechanirm  of  such  rule?  was  discussed  m  Section  2.  1.  3  The  purpose 
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of  thi*  conatruction  wai  to  group  a  string  of  PNP  constituents  without  at¬ 
tempting  to  analyze  the  string  further. 


Pt  j posed  rules 

Generally,  rules  involving  PNP  c  "lstituents  are  in  need  of  reworking 
because  of  recurrent  minor  inconsister  ..ei,  The  corrections  of  the  incon¬ 
sistencies  observed  are  not  pre sente _  here  in  detail  because  of  the  need  for 
further  study  of  the  deeper  grammatical  problems  involved  with  prepo.  tionai 

phrases. 

B.  Predications 

Although  subrules  resulting  m  predications  account  for  about  one 
fifth  of  the  740  subrules  in  RG2,  only  a  limited  study  of  predicationa  could 
be  undertaken  during  the  period  covered  by  this  report.  The  nplexity  of 
the  task  becomec  apparent  when  one  briefly  examines  the  problems  specific 
to  each  type  of  predication.  A  discussion  of  types  of  predications  H)  is  fol¬ 
lowed  Dy  il  castrations  of  the  manner  in  which  predications  were  treated  in 
RG2  rules  (2).  A  comment  about  possible  new  rules  (3)  concludes  this  sec¬ 
tion. 

1.  Types  of  Predications 

Three  types  of  surface-structure  predication  constituents  can  be  dis¬ 
tinguished:  (a)  complete  predications  (subject  and  predicate  are  identifiable); 
(b)  subjectless  predications  (only  the  predicate  is  present);  and  (c)  nomina¬ 
tive  predications  (only  the  subject  is  present). 

Complete  predications 

These  predications,  also  known  as  personal  predications,5*  cons :  st 

*  w 

of  a  subject  and  a  predicate.  Standard  references'  generally  provide  . 
good  description  of  constituent  types  which  car  act  us  -subject-  in  such  con¬ 
structions.  Predicates  can  either  be  simple  (a  single  finite  form  of  a  verb) 
or  can  consist  of  a  verb  and  its  complement  Some  of  the  problems  of  pred¬ 
icates  of  the  latter  type  are  discussed  under  subc  falsification  of  verb'-  (Sec¬ 
tion  2.  2.  2),  but  the  question  of  their  structure  requires  further  study  !'he 
common  instances  of  predicative  (subject -predicate)  agreement  are  equally 
well  described  in  the  same  standard  references  1  ^  and  will  therefore  not  be 
elaborated  upon  here.  Some  of  the  more  difficult  problems  in  predicative 
agreement  include  various  type*  of  ivntsis,  for  example:  $3TO  BY  LA 
VAJNA4  ZADACA  (’thi#  was  an  important  task1),  or  $HOLo  Vy  INS  T  VO  BY  LI 
STUDENTAMI  (’the  majority  were  students 1 ). 


The  Moscow  University  Grammar^  contains  a  comprehensive  dis¬ 
cussion  of  this  subject  which  is  not  repeated  here.  It  is  important  to  distin¬ 
guish  between  personal  elliptical  predications  and  impersonal  predications. 
Within  the  former  group,  there  are  three  subgroups  which  are  uauaUy  re¬ 
ferred  to  as  "definite  personal"  (OPRLDEL4H  'I  define1);  "indefinite  per¬ 
sonal"  ($STALI  SOSTAVL4T6  SPISKI  '(they)  began  to  compile  lists');  and 
"generalized  personal  (typical  only  ol  proverbs  or  belles  lettres  genres; 
the  second  person  singular  has  generalized  meaning:  $NE  POI5EW6,  NE 
NA1TEW6  (literally:  ’If  thou  shalt  not  seek,  thou  shall  not  find  --  One  who 
does  not  s^ek  does  not  find  (anything)')).  The  recognition  of  indefinite  per¬ 
sonal  predications,  which  are  frequent  in  all  styles,  is  difficult  without  ade¬ 
quate  information  regarding  the  type  of  subject  and  objects  verb  requires 
and  other  features  discussed  in  Section  2.  2.  ?..  For  inwtance,  a  rentence 
like  iPLANY  IZMFNLU  V  PROWLUH  P4TNIQU,  without  restrictions,  would 
be  analyzed  correctly  as  ‘Plans  were  changed  last  Friday1  (literally,  '(they) 
changed  plans  last  Friday')  and  incorrectly  as  'Plans  changed  last  Friday1. 

Among  impersonal  predications  it  is  possible  to  single  out  several 
distinct  construction  type*  involving  the  use  of  verbs,  short  -form  passive 
participles,  infinitives,  impersonal  predicates  in  "-0",  and  negated  predi¬ 
cates.  As  noted  ■’bo/e.  details  can  be  found  in  the  Moscow  University 
Grammar.  \  number  ci  points  require  a  brief  mention  here  from  the  point 
of  view  of  difticultiea  created  in  recognition,  however. 

When  a  t  impersonal  predication  consists  of  a  verb,  it  is  necessary 
to  distinguish  between  Impf  onal  verbs  and  impersonal  usage  of  personal 
verbs:  e.g.t  SVETAET  'th  .wn  is  breaking'  as  opposed  to  GUBY  PCDER- 
GIVALO  'lips  were  twitching'  (literally;  'something  twitched  the  lipsV  (Cf. 
Section  2.  2.  ?,.)  Since  impersonal  forms  the  latter  type,  which  are  third 
person  singular  (neuter  in  the  past),  are  identical  to  personal  forms,  they 
are  subject  to  ambiguity  problems  similar  to  those  mentioned  foi  indefinite 
personal  predications. 

In  those  instances  where  the  predicate  ccntains  a  short-form  passive 
participle  and  an  infinitive,  detailed  rules  necessary  to  obtain  correct  anal- 
’  3f3  remain  to  be  worked  out.  For  instance,  *REWEN7E  PRIN4TO  POS - 
LA  T  t ,  bat:  JREWENIE  PPIN4TO  POSYLAT6  ('It  is  the  custom  to  send  the 

decision'). 

In  si  me  constructions  involving  Impersonal  predicates  in  "-C".  and 
some  reflexive  verbs,  dative  forms  (da‘*ve  oi  the  "logical  subject’’  as  op 
posed  to  dative  indirect  object)  create  ambiguities.  For  example:  STU- 
DENIAM  NEOBXODJMO  DOKAZAT6  TEOREMU  --  (1)  'Students  must  prove 
the  theorem',  *2)  'It  is  necessary  to  prove  the  theorem  for  the  students'. 


Subjectless  predication* 


The  Mo* cow  University  Grammar^  contains  a  comprehensive  dis¬ 
cussion  of  this  subject  which  is  not  repeated  here.  It  is  important  to  distin¬ 
guish  between  personal  elliptical  predications  and  impersonal  predications. 
Within  the  former  group,  there  are  three  subgroups  which  are  usually  re¬ 
ferred  to  a n  "definite  personal"  (OPREDEL4H  'I  define');  "indefinite  per¬ 
sonal"  ($5 TALI  SOSTAVL4T6  SFLSKI  '(they)  began  to  compile  lists’);  and 
"generalized  personal"  (typical  only  of  ptroverbs  or  belles  lettres  genres; 
the  second  person  singular  has  generalized  meaning:  $NE  POI5EW6,  NE 
NA1DEW6  (literally:  'If  thou  shalt  not  seek,  thou  shalt  not  find  --  One  who 
does  not  seek  does  not  find  (anything)')).  The  recognition  of  indefinite  per¬ 
sonal  predications,  which  are  frequent  in  all  styles,  is  difficult  without  ade¬ 
quate  information  regarding  the  type  of  subject  and  objects  a  verb  requires 
and  other  features  discussed  in  Section  2.  2.  2.  For  instance,  a  sentence 
like  $PLANY  IZMENILI  V  PROWLUH  P4TNIQU,  without  restrictions,  would 
be  analyzed  correctly  as  'Plans  were  changed  last  Friday'  (literally,  '(they) 
changed  plans  last  Friday1)  and  incorrectly  as  'Plans  changed  last  Friday'. 

Among  impersonrl  predications  it  is  possible  to  single  out  several 
distinct  construction  types  involving  the  use  of  verbs,  short-form  passive 
participles,  infinitives,  ’mpersonal  predicates  in  "-O",  and  negated  predi¬ 
cates.  As  noted  above,  details  can  be  found  in  the  Moscow  University 
Grammar.  A  number  of  points  require  a  brief  mention  here  from  the  point 
of  view  of  difficulties  created  in  recognition,  however. 

When  an  impersonal  predication  consists  of  a  v«*rb,  it  is  necessary 
to  distinguish  between  impersonal  verbs  and  impersonal  usage  of  personal 
verbs:  e,g.,  SYETAET  'the  dawn  is  breaking'  as  opposed  to  GUBY  PODER- 
GIVALO  'lips  were  twitching1  (iiterallv:  'something  twitched  the  lips  %  (Cf. 
Section  2.  2.  2.)  Since  impersonal  forms  of  the  latter  type,  which  arc  third 
person  singular  (neuter  in  the  past),  are  identical  to  personal  forme,  they 
are  auojeut  to  ambiguity  problems  si  „ilar  to  those  mentioned  for  indefinite 
personal  predications. 

In  those  instances  where  the  predicate  contains  a  short-form  passive 
participle  and  an  infinitive,  detailed  rules  necessary  to  obtain  correct  anal¬ 
yses  remain  to  be  worked  out.  For  instance,  ^REWENIE  PRIN4TO  POS- 
LAT6,  but:  REWENIE  PRIN4TO  POSYLAT6  {'It  is  the  custom  to  send  the 
decision'). 

In  some  constructions  involving  impersonal  predicates  in  "-O",  and 
some  reflex,  /e  verbs,  dative  forms  (dative  of  the  "logical  subject"  as  op¬ 
posed  to  dative  Indirect  object)  create  ambiguities.  For  example:  STU- 
DENTAM  NEOBXODIMO  DOKAZAT6  TEOREMU  --  (1)  'Students  must  prove 
the  theorem',  (2)  'It  is  necessary  to  prove  the  theorem  for  the  students'. 
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The  wutne  problem  arises  with  reflexive*  like  POLAGALOSS  ('it  was  neces¬ 
sary').  In  the  case  of  reflexive  verbs,  additional  problems  are  caused  by 
the  relative  position  of  the  dative  form.  For  instance:  STUDENTAM 
XOTELOS6  POKAZAT6  SVOI  ZNANI4  ('Students  wanted  to  show  ('felt  like 
showing')  their  knowledge')  and  XOTELOS6  POKAZAT6  SVOI  ZNANI4 
STUDENTAM  ('One  wanted  to  show  one's  knowledge  before  the  students'). 

Nominative  predications 

This  type  of  predication  is  encountered  in  so-called  one-word  sen¬ 
tences.  Three  subtypes  can  be  distinguished:  (1)  existential  ($VECER 
'evening';  $POJAR.'  'fireJ');  (2)  appellative  ($A?TEKA  'pharmacy');  (3) 
demonstrative  ($VOT  PRIMER  'Here  is  an  example').  With  the  exception 
of  some  of  the  demonstrative  sentences,  this  type  of  sentence  is  almost 
exclusively  found  in  belles  lettres  and  will  not,  therefore,  be  discussed 

further. 62 

2.  RG2  Rules 

Some  two-thirds  of  the  RG2  rules  resulting  in  predications  were  in¬ 
tended  for  the  recognition  of  complete  predications.  Of  the  remaining  third, 
some  twenty  rules  were  intended  to  recognize  instances  where  predications 
were  combined  with  other  constituents,  a  comparable  number  of  rules  was 
introduced  for  impersonal  predications,  and  about  five  ru  as  were  temporary 
ad  hoc  rules  which  will  not  be  discussed. 

Complete  predications 

About  two-thirds  of  the  rules  for  the  recognition  of  complete  predi¬ 
cations  involved  instances  where  the  predicate  was  expressed  by  a  finite 
verb  form.  The  remainder  related  to  predicates  whose  complements  were 
short  forms  of  adjectives  and  participles  linked  by  a  zero-verb  form.  Sub¬ 
jects  of  complete  predications  in  RG2  were  either  nouns  (N),  personal  pro¬ 
nouns  (M),  or  constructions  dominated  by  them. 

In  the  dictionary,  three  subclasses  of  finite  verb  forms  (V)  were 
identified;  (a)  imperative  (SC/M);  (b)  impersonal  (SC/D  and  (c)  personal 
(SC/P).  Forms  of  BYT6  {'to  be')  and  4VL4T6S4  {'to  be')  cock  i  as  AUX 
(auxiliary)  in  combination  with  adverbs  (D),  infinitives  (F),  and  short  forms 
of  adjectives  and  participles  (SF)  resulted  in  finite  verb  constituents  (V) 
whose  tense  and  person  were  those  of  the  AUX.  Dash  (U  SC/DASH)  in  com¬ 
bination  with  a  noun  phrase  (NP)  or  a  preposition-noun  phrase  (PNP)  re¬ 
sulted  in  a  present  tense  finite  verb  form. 63 

The  agreement  requirements  fer  finite-verb  predicate  constructions 
are  illustrated  in  the  following  rule. 


(52)  N  CP/X  NG/Y 

+  V  CP/X  NG/Y 

=  PRED  SC/SV  CP/X  NG/Y  ETC/2 

Comment;  Since  the  attribute  CP  contains  both  person  and  case  infor¬ 
mation,  agreement  in  case-person  is  required.  Present  tense  verbs, 
because  of  the  number  and  render  combination  in  NG,  are  given  all 
three  genders  in  the  singular  and  the  value  P  in  the  plural;  past  tense 
forms  are  assigned  only  the  actual  gender  values.  Predication  con¬ 
stituents  are  assigned  the  attribute  subclass  (SC)  whose  values  indi¬ 
cate  the  subject-predicate  sequence  (SV  for  this  subrule).  The  rules 
for  combining  "verbs"  resulting  from  the  combination  of  a  dash  with 
a  potential  complement  are  much  weaker,  requiring  only  that  the  noun 
be  in  the  nominative  and  the  "verb"  be  of  the  type  mentioned. 

In  addition,  where  applicable,  rules  of  the  type  (52)  were  followed  by 
government  rules  of  the  type  (53),  which  are  analogous  to  other  government 
rules,  for  instance,  (47). 

(53)  N  CP/X 

+  V  CP/Y  GC/X 
=  NVP  CP/Y.GC/2-X  ETC/2 

The  NVP  "noun-verb  phrase"  constituent  was  introduced  in  part  in  an  at¬ 
tempt  to  pre\ent  redundant  analyses. 

The  approach  employed  for  recognition  of  zero-verb  form  predicates 
is  illustrated  in  (54). 

(54)  N  CP/N  NG/X 

+  SF  CP/N  NG/X 

=  PRED  SC/NSF  CP/N  NG/X  ETC/2 

Comment:  A  noun  (N)  in  the  nominative  (CP/N)  agreeing  in  number  - 
gender  (NG)  with  a  short  form  of  an  adjective  or  participle  produces 
a  predication  constituent  (PRED)  of  the  noun  short  form  subclass 
(SC/NSF). 

Predications  combined  with  other  constituents 

The  most  frequent  instances  involved  cases  -like  the  two  following: 

a.  In  KNIGU  BRAT  PRODAL  ('brother  sold  the  book1),  BRAT  and  PRODAL 
produce  a  predication  ■of  the  type  handled  by  (52).  In  order  to  complete 
the  analysis,  the  predication  was  allowed  to  govern  a  left  adjacent  noun 
if  the  verb  was  "blocked"  by  the  subject  (SC/SV). 

(55)  N  CP/X 

+  PRED  GC/X  SC/SV 
=  PRED  SC/NSY  GC/2-X  ETC/2 
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The  above  rule  is  similar  to  (53)  in  the  tests  carried  out. 

b.  A  similar  situation  could  occur  if  the  example  in  (a)  were  preceded  by 
a  preposition-noun  phrase  (PNP);  V  GORODE  KNIGU  BRAT  PRODAL 
('brother  sold  the  book  in  the  city').  Since,  as  noted  in  Section  2. 1.  3, 
a  noun  was  not  allowed  to  pick  up  a  left-adjacent  PNP,  it  was  necessary 
to  add  a  rule  similar  to  (55)  where  a  PRED  constituent  whose  verb  was 
"block- d"  by  a  noun  could  pick  up  a  left-adjacent  PNP  constituent. 

Impersonal  predications 

In  an  attempt  to  highlight  the  problems  involved  In  the  recognition  of 
impersonal  predicates  ("words  of  the  category  of  state"),  a  special  consti¬ 
tuent  class  IP  was  created  consisting  of  forms  in  "-O"  (e.g.,  VAJNO  'it  is 
important').  IP  constituents  produced  by  RG2  rules  resulted  from  a  dictio¬ 
nary  item  (IP)  combining,  for  instance,  an  infinitive  (F)  with  a  governed 
constituent.  For  example,  VAJNO  POLUCIT6  ('it  is  important  to  receive') 
is  analyzed  as  an  IP  constituent  by  a  rule  of  the  type  IP  +  F  =  IP.  Since 
impersonal  predicates  can  govern  other  constituents  on  either  side,  redun¬ 
dant  analyses  can  occur.  Such  analyses  are  avoided  by  employing  a  device 
of  constituent  renaming  similar  to  that  described  for  verbs  in  Section  2.  1.  3. 

Other  predications 

In  an  effort  to  deal  with  elliptical  predications,  every  finite  verb 
form  in  the  dictionary  has  been  supplied  a  PRED  alternative.  The  economy 
of  this  approach  is  questionable.  The  need  for  supplying  PRED  alternatives 
to  finite  verb  forms  can  be  eliminated  by  merging  the  PRED  and  verb  con¬ 
stituents  into  a  single  class.  This  possibility  was  considered,  but  the 
change  was  not  carried  through  because  of  the  need  for  further  study. 

3.  Proposed  Rules 

Subject  to  the  elimination  of  superficial  inconsistencies,  RG2  rules 
for  recognition  of  predication  constituents  provide  a  basic  framework  which 
can  be  improved  by  additional  restrictions  suggested  in  this  section  and  in 
Section  2.  2,  2.  Given  the  necessary  information,  a  new  or  modified  set  of 
rales  car  be  written  in  a  relatively  short  time.  However,  a  considerable 
amount  of  time  will  be  required  for  testing  and  debugging. 

G.  Compound  Coro  ituents 

RG2  contains  a  number  of  rules  intended  for  the  recognition  of  com¬ 
pound  constituents.  These  rules  have  serious  limitations  which  can  be  eli¬ 
minated  only  after  a  detailed  study  of  coordinative  compounds. 


>02  rule* 


The  RuiiUb  conjunction  I  ('and')  and  the  comma  have  been  desig¬ 
nated  as  special  constituents:  AND  and  CMA.  A  given  constituent  in  com¬ 
bination  with  AND  or  CMA  results  in  an  ANDB  (and-block)  or  CMAB  (com¬ 
ma-block)  construction  whose  subclass  (SC)  is  the  class  name  of  the  con¬ 
stituent  with  which  the  AND  or  CMA  combined.  For  instance,  the  rules  for 
nouns  (N)  are  as  follows. 

(56)  AND  +  N  *  ANDB  SC/N  ETC/ 2 

(57)  CMA  +  N  =  CMAB  SC/N  ETC/ 2 

Several  CMAB  constructions  combine  to  form  a  CMAP  (comma-phrase)  con¬ 
struction  by  one  of  the  two  rules  shown  below. 

(58)  CMAB  SC/X  CP/Y 

+  CMAB  SC/X  CP/Y 
a  CMAP  SC/X  CP/Y  ETC/i 

(59)  CMAB  SC/PNP 

+  CMAB  SC/PNP 
=  CMAP  SC/PNP  ETC/i 

Comment!  The  two  CMAB  constructions  must  agree  in  subclass  (SC) 
and  case-person  (CP)  in  order  to  be  combined  into  a  CMAP  construc¬ 
tion,  except  in  instances  where  the  CMAB  results  from  a  preposition- 
noun  phrase  (SC/PNP). 

In  order  to  allow  for  strings  consisting  of  more  than  two  CMAB  construc¬ 
tions,  the  requirements  of  (58)  and  (59)  are  repeated  in  rules  of  the  form 
CMAB  +  CMAP  *  CMAP.  Similarly,  CMAB  and  ANDB  result  in  a  CMAP 
construction,  subject  to  the  same  restrictions. 

To  complete  the  treatment  of  noun  constructions,  a  noun  (N)  in  com¬ 
bination  with  a  following  ANDB,  CMAB,  or  CMAP  construction  of  the  noun 
subclass  (SC/N)  results  in  an  NPB  --  "noun  phrase  block".  The  require¬ 
ments  are  illustrated  in 

(60)  N  SC/X  CP/Y 

+  ANDB  SC/N  CP/Y 
*  NPB  SC/X  CP/Y  ETC/i 


The  types  of  rules  just  discussed  work  well  for  the  more  frequent 
types  of  compound  expressions,  for  Instance  STOL,  STUL  I  KROVAT6 
('table,  chair,  and  bed')  type.  H  sever,  superfluous  analyses  freo.uently 
result  because  of  confusion  with  oetached  constituent  parts.  (Cf.  Section 
Z.  i.  4D. )  Some  other  problems  are  brought  out  in  the  immediately  follow 
log  discussion  of  proposed  improvements. 


Proposed  Improvement* 


Coordinative  ties  are  generally  poorly  deecribed  in  existing  Russian 
grammars. 64  The  essential  conditions  for  the  existence  of  coordinative 
ties  are  that  (a)  the  conjuncts  in  any  given  construction  must  perform  the 
seme  syntactic  functiwn  and  (b)  they  must  refer  to  different  but  homogeneous 
(odnorodnye)  concepts.  The  first  point  can  be  illustrated  by  the  following 
example  containing  a  coordinative  subject:  IX  BESKONECNOE  "4  NE 
ZNAH"  l  ONI  SAMI  NAS  RAZDRAJALI  ('Their  endless  "I  don't  know"  and 
and  they  themselves  irritated  us').  The  second  requirement  is  much  more 
elusive  and  controversial,  and  requires  additional  study. 

The  RG2  rules  rely  on  a  limited  set  of  formal  features  which  seem 
to  be  the  only  ones  that  can  be  used  at  present.  Since  the  attribute  CP  com¬ 
bines  both  case  and  person,  the  requirements  can  be  stated  as  follows:  all 
conjunct*  in  a  coordinative  compound  must  be  of  the  same  constituent  (part- 
of-speech)  ciass,  declined  conjuncts  must  agree  in  case,  and  conjugated 
conjuncts  (verbs)  must  agree  in  person  and  number. 

Usually,  coordinative  ties  are  marked  by  special  coordinative  con- 
junctions which  in  Russian  can  be  subdivided  into  three  groups  according 
to  their  patterns  of  recurrence  in  individual  constructions. 

(1)  Conjunctions  which  are  typically  used  only  once  in  a  given  construction 
but  which  can  be  repeated  to  signal  expressive^?  usage;  I  ('and'),  NO 
('but'),  ILI  ('or'),  and  some  others. 

(2)  Conjunctions  forming  constituents  used  in  pairs,  or  paired  conjunctions. 
For  instance,  NE  TOL6KO  .  .  .,  NO  I  .  .  .  (’not  only  ....  but  also  ...'). 

In  rare  instances,  one  of  the  conjunctions  can  be  repeated  to  lend  added 
expressiveness:  NE  TOL6KO  $IVANOV,  NE  TOL6KO  $PETROV,  NO 

I  VSE  RABoCIE  .  . .  ('not  only  Ivanov,  not  only  Petrov,  but  all  of  the 
workers  ...'). 

(3)  Iterative  conjunctions  which  are  used  at  least  twice  in  a  given  string. 

Fox  instance,  NI  $IVANOV,  NI  $PETROV,  NI  |SIDOROV  .  .  .  (  neither 
Ivanov,  nor  Petrov,  nor  Sidorov 

The  following  four  types  of  coordinative  compounds  can  be  considered 
as  being  basic  ior  Russian.  Their  general  form,  using  N  to  ’•epresent  a 
syntactic  alternative  and  CJs, CJp,  and  CJi  to  represent  single  conjunctions, 
paired  conjunctions,  and  iterative  conjunctions,  respectively,  is  as  follows. 

(a)  Asyndetic  construction  (N,  N,  N):  [  ONI]  RABOTALI,  SOZDAVALI 
STROIL1  they  [worked,  created,  built'). 

(b)  Standard  construction  (N  (,)  CJs  N):  OPYTNYi  I  ZASLUJENNYi 
STQE4R  ('an  experienced  and  distinguished  joiner'). 
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Not*:  Both  of  the  above  construction  type*  can  be  extended  indefinitely 
by  adding  comma-separated  N  on  the  left. 

(c)  P»*ge<*  construction  (CJp  N,  CJp  N):  NE  TOL6KO  P1SAT6  NO  I 
Cl  TAT  6  ('not  only  to  read  but  also  to  write1). 

(d)  Iterative  construction  (CJi  N,  CJi  N):  NI  STUL  ,  NI  STOL  ('neither 
a  chair,  nor  a  table'). 

Note:  This  construction  can  Le  indefinitely  extended  by  adding  (CJi  N  ,  ) 
on  the  left. 

In  the  following  presentation  of  the  general  form  of  coordination  rules 
which  can  be  proposed  in  place  of  those  of  RG2,  it  is  assumed  that  grammat¬ 
ical  ties  are  satisfied  by  agreement  of  CP  attributes  in  each  case,  but,  for 
pVfposes  of  simplicity  of  exposition,  this  test  is  not  explicitly  shown. 

The  following  notational  conventions  will  be  employed  in  the  proposed 
rules:  We  shall  introduce  for  conjunctions  (CJ)  an  attribute  SUBCL  (sub¬ 
class)  with  values  SINGLE,  PAIRED,  and  ITERAT  (iterative)  and  for  the 
four  construction  types  an  attribute  CONSTR  (construction)  with  values 
ASYND  (asyndetic),  STD  (standard),  PAIRED  (paired®®),  and  ITERAT  (iter¬ 
ative).  Thus,  a.,  asyndetic  construction  is  N  CONSTR/ASYND.  Comma  is 
regarded  as  belonging  to  a  separate  constituent  class  CMA. 

The  analysis  proceeds  as  follows. 

Step  1.  Since  the  combination  of  CMA  and  N  results  in  N  CMABL/PLUS 

(61) ,  and  the  combination  of  CJ  and  N  results  in  N  CJBL  whose 
values  are  copied  from  the  subclass  (SUBCL)  of  the  conjunction 

(62) ,  a  single  N  must  not  have  either  of  *he  two  attributes 
(CMABL/*,  CJBL/*),  In  addition,  it  may  not  have  the  attribute 
CONSTR  (CONSTR/*),  which  is  reserved  for  complete  coordinations. 

(61)  CMA 

+  N  CONSTR/*  CMABL/*  CJBL/* 

=  N  CMABL/PLUS  ETC/ 2 

(62)  CJ  SUBCL/X 

+  N  CONSTR/*  CMABL/*  CJBL/* 

=  N  CJBL/X  ETC/2 

Because  some  nouns  which  have  combined  with  a  conjunction  must 
also  combine  with  a  comma,  the  following  additional  rule  is 
necessary. 

(63)  CMA 

♦  N  CONSTR/*  CMABL/*  CJBL/X 
=  N  CMABL/PLUS  CJBL/X  ETC/2 
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Step  2. 


Step  3. 


Step  4. 


Step  5. 


In  order  to  account  for  the  possibility  of  tingle  conjunctions  and 
parts  of  paired  conjunctions  being  repeated,  a  rule  is  added  to 
combine  successive  conjuncts  involving  them  into  a  single  consti¬ 
tuent: 

(64)  N  CONSTR/*  CJBL/X-ITERAT  CMABL/*  NCX1/* 

+  N  CONSTR/*  CJBL/X-ITERAT  CMABL/PLUS 

=  N  CJBL/X  NCXi /PLUS  ETC/2 

Comments:  This  rule  does  not  apply  to  iterative  construction 
conjuncts  (CJBL/X-ITERAT).  NCXI  is  a  locally  important 
tag  which  is  intended  to  avoid  redundant  analyses.  Note  that 
such  repetitions  of  conjunctions  for  emphasis  must  be  sepa¬ 
rated  by  commas  (CMABL/PLUS  on  C;>). 

A  similar  rule  is  introduced  for  combining  portions  of  asyndetic 
constructions: 

(65)  N  CONSTR/*  CJBL/*  CMABL/PLUS  NCXi/* 

+  N  CONSTR/*  CJBL/*  CMABL/PLUS 

=  N  CMABL/PLUS  NCXi /PLUS  ETC/1 

Recognition  of  asyndetic  strings  can  be  completed  by  the  following 
rule. 

(66)  N  CONSTR/*  CJBL/*  CMABL/* 

+  N  CONSTR/*  CMABL/PLUS  CJBL/* 

=  N  CONSTR/ AS YND  ETC/ 2 

Comment:  The  C^  must  not  have  the  CJBL  attribute  (CJBL/*) 
in  order  to  prevent  output  of  (62)  from  combining  erroneously. 

Recognition  of  standard  constructions  can  be  completed  by  the  fol¬ 
lowing  rule. 

(671  N  CONSTR/ (AS YND)  CJBL/*  CMABL/* 

+  N  CONSTR/*  CJBL/SINGLE  CMABL/ (PLUS) 

=  N  CONSTR/STD  ETC/2 

Comment:  The  values  of  CONSTR  on  Cj  and  of  CMABL  on 
can  be  verbalised  as  follows:  ,rThe  attribute  can  be 
missing,  but  if  It  is  not,  its  value  must  be  the  one  enclosed 
in  pa  rentheses." 


Recognition  of  paired  constructions  can  be  completed  by  the  foliov,- 
ing  rule. 

(6b)  N  CONSTR/*  CJBL/ PAIRED  CMABL/* 

+  N  CONSTR/*  CJBL /PAIRED  CMABL/PLUS 
*  N  CONSTR /PAIR ED  ETC/ 2 
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Comment:  Paired  conjunction  constructions  should  be  addi¬ 
tionally  differentiated  to  identify  the  first  and  the  second  half 
of  the  pair.  This  is  not  taken  into  account  in  (68). 

Step  6.  Recognition  of  iterative  constructions  can  be  completed  by  the  fol¬ 
lowing  rule. 

(69)  N  CONSTR/*  CJBL/ITERAT  CMABW* 

+  N  CONSTR/*  CJBL/ITERAT  CMABL/PLUS 
=  N  CONSTR/ITERA T  ETC/ 2 

Comment:  In  certain  iterative  constructions,  comma  is  not 
required,  but  this  possibility  is  not  provided  for  in  the 
present  set  of  rules. 

In  the  proposed  rules  considered  so  far,  N  has  been  assumed  to  be 
either  a  constituent  spanning  a  single  woru  or  else  an  endocentric  construc¬ 
tion.  However,  in  every  one  of  the  four  basic  strings  each  such  simple  N 
can  be  replaced  by  an  N  construction,  subject  to  the  restrictions  in  Table 
II- 11. 


Table  11-11:  Possible  N- Constituents  in  Compound  Constructions 


Possible  N  Constituents 


in  the 

construction 

Single 

N 

N 

C.ONSTR/ 

ASYND 

N 

CONSTR/ 

STD 

N 

CONSTR/ 

PAIRED 

N 

CONSTR/ 

ITERAT 

Asyndetic 

+ 

- 

+ 

+ 

_ 

Standard 

+ 

14  ) 
r 

d<2) 

+ 

+(2» 

Paired 

+ 

+ 

+ 

- 

- 

Iterative 

_ 

,(2> 

_ 

(1)  Seems  generally  possible  except  for  instances  where  the  conjunction 
used  in  the  higher -order  construction  requires  punctuation. 

( 2 )  The  same  conjunction  cannot  be  used  to  form  the  N  construction  and 
to  combine  it  with  other  members  of  the  higher-order  construction. 

Table  Il-ii  provides  a  general  framework  which  can  be  used  .n  recog¬ 
nition  of  cocrdinative  constructions.  Minor  adjustments  may  be  necessary 
for  nouns  and  adjectives  which  can  form  accumulative  strings  not  considered 
here.  Information  presented  in  Table  II- 11  can  be  incorporated  without  dif¬ 
ficulty  into  the  type*  of  rules  shown  in  (6l)-(69).  However,  since  some 
twenty-eight  rules  would  be  required,  they  are  not  presented. 
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Because  the  information  presented  in  Table  II- 11  could  only  be  hand- 
tested,  it  is  difficult  to  establish  'which  of  the  constructions  are  actually  pos¬ 
sible.  Normally,  the  constructions  are  symmetrical,  i.e.,  the  same  N- 
constituent  is  used  throughout.  For  instance,  $ON  CITAL  $DOSTOEVSKO- 
GO  I  $BAL6ZAKA,  $TURGENEVA  I  $TOJLSTOGO  ('He  read  Dostoyevsky  *nd 
Balaac,  Turgenev  and  Tolstoy1)  contains  an  asyndetic  compound  of  "stan¬ 
dard"  coordinative  constructions.  However,  assymetry  is  also  possible: 

$Ei  NRAVILIS6  KRASNYE  ILI  ZELENYE,  NO  NE  CERNYE  LENTY  ('She 
liked  red  or  green  ribbons  but  not  black  ones')  where  a  "standard"  coordi¬ 
native  compound  consists  of  a  "standard"  coordinative  construction  and  a 
single  N.  One  of  the  major  difficulties  in  the  study  of  coordinative  com¬ 
pounds  is  that  there  are  a  variety  of  options,  with  the  choice  among  them 
strongly  affected  by  stylistic  considerations.  Moreover,  in  addition  to 
developing  restrictions  based  on  semantic  compatibility  (cf.  footnote  65), 

It  is  necessary  to  consider  the  syntactic  function  of  conjuncts.  For  instance, 
the  sentence  NE  STOL6KO  BRAT  SKOL6KO  OTEQ  PROCITAL  KNIGU 
('not  so  much  the  brother  as  the  father  read  the  book'),  where  the  paired 
conjunction  is  in  the  subject  and  hence  cannot  exhibit  its  additional  adverbial 
prouerties,  is  at  best  marginal.  However,  ON  BYL  NE  STOL6KO 
BRATOM,  SKOL6KO  OTQOM  ('he  was  more  of  a  father  than  a  brother'), 
where  the  same  paired  conjunction  occurs  in  the  predicate,  is  standard 
usage. 

D  Punctuattonally  Delimited  Constituents 

As  noted  in  Section  2.  1.  2,  RG2  includes  a  small  number  cf  rules  in- 
tenav-*  for  the  recognition  of  subordinate  clauses  and  detached  constituents, 

In  text,  both  constituent  types  are  set  off  by  detaching  punctuation,  the  re¬ 
cognition  of  which  involves  a  number  of  difficulties,  some  of  which  are  de¬ 
scribed  below. 

Detaching  punctuation 

Detaching  punctuation  is  usually  in  the  form  of  paired  comma*  which 
are  sometimes  replaced  by  dashes  or  a  combination  of  a  comma  and  a  dash 
For  a  string  B  which  is  dominated  by  a  string  A  and  detached,  the  expected 
pattern  of  punctuation  is  (A,  B,  ).  However,  in  sentence -final  position,  the 
same  pattern  is  of  the  form  (A,  B.  )  and  may  be  difficult  to  distinguish  from 
anasyrdetic  coordinative  compound.  Fcr  ins’ance,  $MY  YTDELI  DVUX 
PTIQ,  DVUX  UTOK  m?v  be  Interpreted  as  cither  We  saw  two  birds,  (i.e.) 
two  ducks'  or  'We  saw  two  birds  (and)  two  ducks'.  Similar  problems  can 
occur  in  sentence -initial  j,  osittor:  $T4JELO  BOL6NYE,  OFIQERY  I  SOL- 
DATY  NAPRAVL4LIS6  V  TYL.  This  sentence  has  two  possible  interpreta¬ 
tions:  (a)  'Being  seriously  ill,  the  officers  and  soldiers  wee*  sent  to  the 
rear’  or  {b}  The  seriously  ui.  the  officers,  and  the  soldiers  were  sent  to 
the  r ear'. 


Instances  where  a  dash  and  a  comma  are  UBed  in  combination  can  be 
ov<  n  more  confusing.  For  example,  $JITELI  CORODA  -  MUJCINY  I 
i'EfiSIN '/,  ODifTYE  PO-PBAZDNICNOMU,  OJIDALf  PRIEZDA  GOSTEd 
ran  be  interpreted  correctly  as  (a)  ’Inhabitants  of  the  city  --  men  and  women 
drenaed  in  their  Sunday  beat  --  awaited  the  arrival  of  visitors'  or  incorrectly 
<b)  :  The  inhabitants  of  the  city  are  men,  and  women  dressed  in  their  Sunday 
beet  awaited  the  arrival  of  visitors'. |  i 

Rebated  to  the  problems  involving  boundary  recognition  is  the  problem 
of  "bridging”  detached  constituents.  For  instance,  in  $ONI  PE  LI,  VY1D4 
NA  SQENU,  FESNI  I  ROMA  NS  Y,  in  addition  to  the  correct  interpretation, 

(a)  ’Having  entered  the  stage,  they  sang  songs  and  romances',  an  incorrect 
interpretation,  { b)  'Having  entered  the  stage,  the  songs  and  the  romances, 
they  sang',  is  difficult  to  a«oid. 

Finally,  a  recurrent  problem  is  that  both  clauses  and  detached  consti¬ 
tuents  can  form  coordinative  compounds,  in  which  case  the  punctuation  may 
be  d  amatically  different  from  what  would  normally  be  expected:  e.g., 

$MY  GUL4LI  I  KOGDA  SVETILO  SOLNQE  I  KOGDA  WEL  DOJD6,  'We 
took  walks  both  when  the  sun  was  shining  and  when  it  rained',  where  the 
comma  that  would  normally  precede  each  KOGDA  clause  is  omitted. 

Subordinate  clauses 


3 
8 

4 

£ 

i 

i 
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In  RG2  rules,  subordinate  clauses  are  treated  on  a  token  basis  essen¬ 
tially  limited  to  KOTORY 1  -relative  clauses  and  CTO-  and  CTOBY-clauses. 
The  latter  two  constructions  are  further  limited  to  instances  where  they  arc* 
governed  by  either  a  noun  or  a  verb  (e.g,,  NADEHS6,  CTO  ...  ('I  hope  that 
.  .  .  ')).  KOTOR YT -clauses  created  difficulties  when  the  relative  pronoun  was 


"buried”  in  the  clause:  .  .  .  CELOVEK,  NA  USTALOM  LIQE  KOTOROGO  >s 

BYLA  ZAMETNA  ULYBKA,  ...  ('.  .  .a  man  on  whose  tired  face  one  could  C t 

notice  a  smile.  .  .  ').  Some  methods  for  coping  with  this  problem  have  been  O 

considered  out  not  actually  implemented:  gender,  number,  and  animateness  C3 

of  the  pronoun  can  be  recorded  in  a  special  tag  and  this  tag  carried  as  the 
attribute  of  the  clause.  "Z-t 


( '5 

In  addition  to  problems  of  formal  agreement,  future  research  must  r,  “ 

t  f  -  "i? 

take  into  account  various  selection  restrictions  on  combinations  of  relational 

words  (relative  pronouns)  with  other  constituents.  For  instance,  since  in  G- 

.  ..KOMANDIR  GARNIZONA,  KOTOR  Yi  BYL  JENAT  NA.  .  .  {'...com- 

mander  of  the  garrison,  who  was  married  to  .  .  .  '),  KOTORY  1  is  the  subject 

of  a  verb  requiring  an  animate  subject,  it  can  refer  only  to  an  animate  noun.  O 

CO 

The  subordinating  conjunctions  were  also  investigated  during  the  con¬ 
tract  period.  However,  the  results  art-  inconclusive.  Such  important  infor¬ 
mation  as  the  relative  position  of  the  subordinate  clause  with  respect  to  the 


t 
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main  clause,  the  ability  or  inability  of  a  relative  clause  to  refer  to  a  single 
word  or  act  as  a  sentential  modifier,  tense  agreement  between  the  verbs  in 
both  the  main  a ud  the  subordinate  clauses  and  many  other  problems  are 
dealt  with  rather  superficially  in  the  Academy  Grammar^  and  will  require 
additional  study. 

Detached  constituents 

The  attempts  to  deal  with  detached  constituents  have  been  generally 
satisfactory  in  the  most  obvious  instances,  i.e.,  when  an  adjective  or  a 
gerund  is  in  sentence -me dial  position  and  is  clearly  delimited  by  punctuation 
which  can  be  absorbed  into  the  constituent.  Because  the  large  number  of 
rules  that  would  have  been  required  to  handle  them  adequately  in  RG2  were 
not  included,  detached  constituents  in  sentence -final  position  were  often 
incorrectly  identified  only  as  parts  of  coordinative  compounds,  with  which 
they  are  superficially  identical.  Moreover,  because  of  the  absence  of  any 
definitive  study  on  the  subject,  restrictions  on  the  ability  of  a  constituent 
containing  a  detached  constituent  to  combine  with  other  constituents  were 
not  worked  out  in  a  satisfactory  way. 

E.  The  CSA  Syntactic  Dictionary 

In  order  to  conduct  tests  on  the  160-sentence  sample  of  Pravda  edi¬ 
torials,  an  experimental  dictionary  of  about  1650  Russian  word  forms  and 
punctuation  symbols,  covering  the  vocabulary  and  punctuation  in  the  sample, 
was  compiled.  Linguistic  coding  was  completed  for  a  similar  full-form 
dictionary  covering  the  entire  1600 -sentence  corpus  of  Pravda  editorials. 
However,  these  additional  entries  were  not  incorporated  into  the  CSA  dic¬ 
tionary  because  the  machine  processing  necessary  to  perform  this  operation 
was  not  completed. 

The  entries  in  the  CSA  dictionary  consisted  of  the  Russian  word  forms 
or  punctuation  symbols  followed  by  their  respective  syntactic  alternatives. 
For  instance,  the  entry  for  the  form  GLUBOKOl  ('deep')  was: 

(70)  *GLUBOKOi 

1  A  SC/A  CP/G  NG/F  W/GLUBOKOl 

2  A  SC/A  CP/D  NG/F  W/GLUBOKOl 

3  A  SC/A  CP/I  NG/F  W/GLUBOKOl 

4  A  SC/A  CP/L  NG/1'  W/GLUBOKOl 

Comment:  GLUBOKOl  is  classified  as  an  adjective  (A)  whose  subclass 

(SC)  indicates  that  it  is  a  positive  degree  adjective  proper  (A)  (see 

Table  II- 1).  The  attribute  W  (word)  has  the  actual  word  form  as  its 

value  (W/GLUBOKOl).  All  of  the  alternatives  are  feminine  singular 


115 


(NG/F)  adjectives.  (For  the  meaning  of  the  attribute  NG,  see  Section 
2,  i.  3.)  The  alternatives  differ  in  the  value  of  the  case-person  (CP) 
attribute  (cf.  2.  1.  3)  which  has  values  genitive  (G),  dative  (D),  instru¬ 
mental  (I),  and  locative  (L). 

The  entry  for  the  dash  (--)  was  as  follows: 

(71) 

i  U  SC/DSH  W/DASH 

Comment:  In  RG2,  all  punctuation  symbols,  with  the  exception  of 
comma,  are  assigned  to  the  constituent  claes  U  (U).  The  SC/DSH 
is  the  subclass  "aash"  of  U.  The  tag  W  contains  the  actual  name 
of  the  symbol. 

A  detailed  description  of  the  constituent  classes  and  tags  appearing 
in  the  CSA  dictionary  is  not  provided  here  because  the  pertinent  information 
has  already  been  presented  in  the  discussion  of  selected  rules  of  the  CSA 
Russian  grammar. 


2.  2  Subclassification  studies 


2.  2.  1  Nouns 

A  pilot  study  of  possible  subcla 3 sifi cation  of  nouns  was  conducted 
with  the  aid  of  Library  of  Congress  personnel.  This  study  was  an  outgrowth 
of  work  performed  under  earlier  contracts^  and  was  based,  in  part,  on 
semantic  subclasses  of  nouns  encountered  in  the  course  of  work  on  preposi¬ 
tion-noun  phrases  described  in  A9  of  Section  2.  1.  4  of  this  report. 

In  an  attempt  to  develop  uniform  standards  of  classification  to  facili¬ 
tate  eventual  processing  of  the  results  by  computers,  a  special  questionnaire 
(Figure  II- 2)  was  compiled  to  be  used  in  conjunction  with  classification  charts. 
The  questionnaire  contained  yes -no  questions  and  room  for  recording  the 
codes  of  information  derived  from  the  charts.  Whereas  the  queries  in  the 
questionnaire  were  directed  at  exploring  the  analyst's  Spyachgefuhl  and  were 
unstructured,  the  charts  represented  an  attempt  to  illustrate  graphically  a 
proposed  system  of  classification  in  the  form  of  a  branching  diagram.  It 
was  soon  discovered,  however,  that  the  charts  were  incomplete  and  difficult 
to  follow.  As  a  consequence,  the  questionnaire  was  revised  and  used  alone 
in  subsequent  experiments. 

This  section  contains  a  description  of  the  questionnaire,  followed  by 
an  assessment  of  the  results  obtained  in  the  pilot  study.  It  concludes  with 
some  interim  proposals  for  ijuu  classification. 
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Figure  H-2:  TEST  VERSION  OF  NOUN  CLASSIFICATION  SHEET  REVISED  January  14,  1060 
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M4GKI1 

8 

taste 

0 

JIDKU 

GUSTOl 

£ 

4.  in  cols 

smell 

8 

WIROKU 

UZKU 

to 

to 

Nx  is  in  gen  tv. 
KUSOK.CAST6 

O 

CO 

TEMNY1 

4SNY1 

£ 

TIXU 

GROMKI1 

K 

ouk  i  Runw  yui/riva  x  v 

RUSSIAN  V/CRD 

ELECTRIC  CONDUCTIVITY 
ENGLISH  MEANING 

(to  serve  as  a  guide) 

PRED87AVITEL0 

£ 

GOR4CI1 

XOLODNY1 

8 

NACAL6NIK, 

UI  RE  K  TOR, 
KOMANDiR 

. 

& 

ZASTENQVY1 

VESELY1 

¥ 

BYSTRY1 

8IL8NY1 

a 

BESQVETNY1 

MUC1TEL6NY 1 

OD 

1MPERS.  PRED1C. 
(semantic 
meaning) 

■4 

to 

Indicate  here  if  mere  are 
any  comments,  then  use 
the  back  of  the  sheet,  if 
necessary. 

DATE  INITIALS 
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The  questionnaire 


The  questionnaire  was  developed  at  IBM  Research,  but  the  bulk  of 
the  experimental  classification  work  was  done  by  the  Library  of  Congress 
lexicographers.  The  numbering  of  questions  is  not  continuous  because  of 
the  veviHions  made. 

The  English  meanings  in  the  questionnaire  were  intended  to  serve  as 
a  guide  for  the  identification  of  Russian  words  (see  lower  left  corner  of 
Figure  II-2).  Non-standard  or  archaic  usage  was  not  considered.  Reasons 
for  including  a  given  question  or  group  of  questions  are  outlined  below. 

Since  the  design  of  the  questionnaire  is  of  a  tentative  nature,  accompanying 
each  explanation  are  comments  on  how  the  questions  were  in  fact  interpreted 
by  the  analysts  taking  part  in  the  classification  of  nouns. 

Questions  1-10:  This  group  of  questions  is  intended  to  establish  the  proper¬ 
ties  of  the  noun  under  analysis  (Nx)  which  are  manifested  in  its  function  as 
the  subject  of  the  sample  verbs.  Thus,  BOLIT  ('it  aches')  excludes  any  ani¬ 
mate  agents  and  is  limited  to  the  nouns  denoting  parts  ind  certain  defects  of 
the  body  (RUKA  'hand',  RANA  'wound').  BOLEET  ('is  sick,  ailing')  typi¬ 
cally  admits  animate  agents  and  can  also  be  combined  with  (inanimate)  nouns 
denoting  living  organisms  (DEREYO  BOLEET  'the  tree  is  ailing'),  while 
CUVSTVUE’T  ('senses')  and  NERVNICAET  ('is  nervous’)  admit  only  animate 
agents.  The  notion  of  an  animate  agent  includes  both  the  grammatically 
animate  nouns  (CELOVEK  'man')  and  formally  inanimate  nouns  characterized 
by  "real"  animateness  (NASELENIE  'population').  To  distinguish  the  two 
types  of  animateness,  RASTET  ('grows')  and  UMEN6WAETS4  ('diminishes') 
were  added,  since  the  former  admits  both  types  of  animate  agents  and  the 
latter  excludes  grammatically  animate  nouns  and  a  few  collective  nouns  of 
the  type  CETA  ('couple').  Hence,  on  the  oasis  of  answers  provided  regarding 
possible  combinations  with  the  above-mentioned  verbs,  several  semantic  and 
syntactic  features  of  a  noun  can  be  quickly  established. 

Questions  11-28:  Ability-inability  of  the  Nx  to  combine,  subject  to  proper 
agreement,  with  the  sample  adjectives  is  intended  to  bring  out  additional 
semantic  nuances. 

Questions  36-47:  In  all  these  tests  the  nevin  being  analyzed  must  be  in  the 
genitive  case. 

Questions  36-37:  Ability-inability  to  <:  mbine  with  adverbial  quantifiers  of 
the  type  MNOGO  ('much,  many')  and  the  number  of  the  gevernad' noun  are 
important  formal  features.  Some  additional  distinctions  are:  All  count  nouns 
combine  with  MNOGO  in  genitive  plural  (MNOGO  STUL6EV  'many  chairs'). 
However,  many  nouns  which  combine  with  MNOGO  in  genitive  plural  are  noi 
countable  (MNOGO  TRUDNOSTE1  'many  difficulties’).  The  so-called  "mass" 
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nouns  combine  with  MNOGO  in  genitive  singul*;  (MNOGO  SAXARA  'much 
sugar').  However,  thn  same  is  true  of  many  other  nouns,  among  them 
WUM  ('noise'),  ZLO  ('evil'),  CVANSTVO  'boastfulness').  Many  collective 
nouns  of  the  type  PROLETARIAT  {'proletariat')  or  STUDENCESTVO  ('stu¬ 
dents  -  collectively')  cannot  combine  with  MNOGO  at  all.  The  same  is  true 
of  mc3t  "abstract"  nouns,  especially  no  ena  actionis  like  ZAROJDENIE 
('nascense'),  V0PL05ENIE  ('incarnati  n')  and  others. 

Questions  38-41:  Abtlity  to  combine  with  prepositions  and  prepositional 
constructs  is  self-explanatory.  The  semantic  criteria  sought  are  suggested 
by  the  meaning  of  the  respective  prepcslticns:  PO  SLUCAH  ('on  the  occa¬ 
sion  of),  VO  VREM4  ('during'),  and  VNUTRI  ('inside').  VO  VREM4 
should  not  be  confused  with  VO  VREMENA  ('in  the  days  of'),  nor  VNUTRI 
with  V  {'in'). 

Question  42:  Is  Nx  governed  by  SOST04NIE  ('the  state  of')? 

Question  43:  This  question  should  be  answered  "yes"  only  for  normally 
countable  nouns  (*DVADQAT6  $GERMANI1  'twenty  Germanies'). 

Question  44:  Can  Nx  appear  in  the  context  $ON  NAZNACEN  NA  POST, 
MESTO  .  .  .  ('he  is  appointed  (to  the  post  of)  .  .  .  ')?  The  question  should  be 
answered  "ye3"  only  when  the  resultant  construction  has  equivalents  like 
the  following:  $ON  NAZNACEN  NA  POST  DIREKTORA  -  $ON  NAZNACEN 
DIREKTOROM  ('he  is  appointed  director')  or  $ON  NAZNACEN  NA  DIREK- 
TORSKI1  POST  ('he. is  appointed  to  the  office  of  a  director').  These  addi¬ 
tional  restrictions  are  intended  to  eliminate  the  instances  of  the  genitive 
only  denoting  "possession":  $ON  NAZNACEN  NA  MESTO  BRATA  ('he  is 
appointed  to  his  brother's  office'). 

Questions  45-47:  Using  ISKLHCENIE  ('expulsion,  elimination')  or 
SOKRA5ENIE  ('contraction,  reduction')  as  representing  a  class  of  deverbal 
nouns,  the  question  concerning  subject  genitive -object  genitive  distinction 
was  intended  to  confirm  animate -inanimate  distinction.  It  was  vdtially  as¬ 
sumed  that  animate  nouns  could  be  both  subject  and  object  genitives,  but  in¬ 
animate  nouns  only  object  genitives.  This  was  an  unfortunate  oversight. 

For  example,  in  VRA5ENIE  LUNY  {'rotation  of  the  moon'),  one  obviously 
is  confronted  with  subject  genitive  (Cf.  LUNA  VRA5AETS4  'the  moon  ro¬ 
tates'),  The  questions  should  have  been  eliminated. 

Questions  48-5 1:  In  these  tests  the  noun  being  ana'yr-ed  should  be  in  the 
accusative,  $4  CUVSTVUH  (question  48)  should  be  used  ih  the  sense  '!  ex¬ 
perience*  or  'I  sense'  (Cf.  Russian  $4  ISPYTYVAH  CUYSTVO  .  •  ^  Uiye*. 
Only  notihi  denoting  various  emotions  and  sensations  seem  to  fit  in  Qns  con¬ 
text!  for  instance,  BOLb  {'pain'),  NEGODOVANIE  ('disgust'),  RADOST6  ('joy') 
and  some  others.  $4  PRIWEL  V  (question  £0)  can  combine  with  nouns  de« 
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noting  points  in  time  (P0LDEN6  ‘noon’)  or  locations  in  space  (GGROD  'city'), 
some  of  which  can  best  be  called  generalized  locations.  Instances  of  the  lat¬ 
ter  type  include  military  units  such  as  POLK  ('regiment'),  CAST6  ('unit'), 
ROTA  ('company'),  and  others.  In  these  cases  PRIWEL  is  a  verb  of  motion: 
$4  PRIWEL  V  P4TNIQU  ('I  came  on  Friday').  When  combined  with  abstract 
nouns  denoting  certain  emotional  states,  PRIWEL  becomes  a  link  verb  as 
evidenced  in  part  by  English  translations:  $4  PRIWEL  V  UJAS  ('I  was  hor¬ 
rified');  $4  PRIWEL  V  ISSTUPLENTE  ('I  was  outraged').  $4  SIDEL  ('I  sat') 
in  question  51  can  be  followed  only  by  the.  accusatives  of  time  (e.g.,  GOD 
'year',  CAS  'hour',  NEDELH  'week').  The  three  frames,  then,  help  to  iden¬ 
tify  a  number  of  divergent  semantic  distinctions:  locations,  time  periods  and 
points  in  time,  sensations,  emotions,  and  emotional  states. 

Question  52:  $4  STRADAL  ILI  FOLEL  ('I  suffered  from  or  was  sick  with') 

can  be  followed  by  a  noun  in  the  instrumental  singular  denoting  afflictions 
and  diseases  ($4  BOLEL  TIFOM  'I  was  sick  with  typhoid'),  or  time  periods 
in  the  same  way  as  questions  53  and  75.  Plural  nouns  which  alBo  can  follow 
$4  STRADAL  ILI  BOLEL  are  restricted  to  countable  nouns,  namely,  those 
which  can  follow  $4  SIDEL  (question  51). 

Questions  5  3-57:  The  next  four  questions  represent  an  attempt  to  define  the 
function  in  the  instrumental^  of  nouns  which  can  combine  with  NAZNACEN 
('appointed')  (question  53).  Thus,  "by1’  (question  54)  is  intended  to  signal  the 
agent  instrumental  ($ON  NAZNACEN  $IVANOM  'He  was  appointed  by  Ivan'; 
$IVAN  EGO  NAZNAClL'Ivan  appointed  him');  the  use  of  "as"  (question  55)  is 
equivalent  to  'when  he  was'  (NAZNACEN  MOLODYM  CELOVEKOM  'appoint¬ 
ed  as  a  young  man').  Question  56  probes  the  possible  use  of  the  Nx  as  a 
complement  of  the  type  NAZNACEN  KOMANDIROM  ('appointed  as  a  com¬ 
mander1).  Other  meanings  include  various  adverbial  functions  lik»  NAZ¬ 
NACEN  PRIKAZOM  in  the  sense  of  NAZNACEN  PO  PRIKAZU  ('appointed 
on  the  order'). 

Question!  58-62:  Questions  58-62  are  self-explanatory  but  have  proved  dif¬ 
ficult  to  answer.  Basically/  they  represent  an  attempt  to  identify  "concrete¬ 
ness"  in  the  literal  sense. 

Questions  63-65:  The  three  questions  require  Nx  to  be  in  the  genitive. 
KUSOK,  CAST6  {'piece,  part,  portion')  (question  63)  establish  "divisibility" 
of  the  Njj.  It  seems  that  only  literally  "abstract  ’  cannot  follow  one  of 

the  nouns  of  KUSOK  or  CAST6-type,  For  instance,  'CAST6  GGTOVNGSTI 
('a  portion  of  preparedness').  Ability  to  combine  with  PREDSTAV1TEL6 
('representative')  (question  64)  is  restricted  (a)  to  nouns  denoting  beings 
(PREDSTAV1TEL6  RABOCIX  'workers'  representative' )  ar.d  to  formally  in¬ 
animate  nouns  which  may  acquire  "real"  animateness  (e.g.,  ZAVOD  'factory' 
or  POLK  'regiment');  (b)  to  nouns  denoting  genera  (PREDSTAVTTEL6 
MLEKQFITAHSIX  'a  representative  Qf  the  mammals');  and  (c)  to  individual 
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humans  mentioned  by  name.  NACAL6N1K  ('head'),  DIREKTOR  ('director'!, 
and  KOMANDIR  ('commander')  should  help  to  single  out  nouns  mentioned  in 

(a). 

Question  72:  This  question  establishes  whether  or  not  Nx  has  a  correspon¬ 
ding  impersonal  predicate.  For  instance,  STRAX-STRAWNO  ('fright,  fear 
--  it  is  frightening');  OPASNOST6-OPASNO  ('danger--it  is  dangerous'); 
MOROZ-MOROZNO  ('cold,  frost  --  it  is  chilly  (freezing)');  and  others.  It 
seems  that  this  test  would  apply  almost  exclusively  to  nouns  denoting  certain 
states,  including  meteorological  conditions  (TUMAN  'fog'  or  JARA  'heat'), 
and  emotions. 

Questions  74-75:  Ability  to  combine  with  the  verb  in  both  frames  is  intended 
to  single  out  nouns  denoting  time  periods  and  locations.  $4  PROWEL  fol¬ 
lowed  by  the  Nx  in  the  accusative  usually  corresponds  to  the  construction 
known  as  accusativus  extensionis  (accusative  of  extent):  $4  PROWEL  DVAD- 
QAT6  MIL6  ('I  went  twenty  miles').  Constructions  with  the  instrumental  can 
either  contain  nouns  denoting  locations  (Cf.  Latin  ablativus  loci):  $4  PRO¬ 
WEL  POLEhl  ('I  went  via  the  field'),  or  those  functioning  as  time  adverbials: 
$4  PROWEL  LETOM  (’I  went  through  in  the  summe;*').  Unfortunately,  many 
other  constructions  can  occur  in  both  frames,  especially  if  the  context  is  in¬ 
creased.  For  instance,  when  the  noun  is  in  the  accusative  case,  PROWEL 
can  combine  with  nouns  denoting  school  subject  course  names  and  other 
nouns:  $4  PROWEL  KURS  NAUK  ('I  completed  a  course  of  study').  A  noun 
in  the  instrumental  can  perform  the  same  function  as  the  one  possible  in 
question  55.  For  instance,  MOLODYM  OFIQEROM  $4  PROWEL  KURS 
NAUK  ('as  a  young  officer  I  completed  a  course  of  study'). 

Question  77 :  Is  Nx  *  proper  name  ? 

Question  78:  Is  Nx  a  "nondescript"  noun  ?  This  proved  to  be  one  of  the 
more  difficult  questions  and,  baaed  on  experience  to  date,  this  category 
should  be  reworked  or  eliminated.  The  category  was  adopted  from  Eaton 
(1961;  437)  and  was  intended  to  identify  nouns  denoting  items  characterized 
only  by  ahape  (SPIRAL6  'spiral'),  function  (CISLO  'number'),  or  quantity 
(U1MA  'an  awful  lot'),  etc. 

Question  79:  The  ability  to  have  a  (geometric)  shape  was  intended  as  a  con¬ 
firmation  of  concreteness  of  the  noun. 

Question  83:  The  ability  to  combine  with  PRIKOSNUT6S4  K  (  to  touch1)  was 
intended  to  reinforce  the  test  for  concreteness  described  immediately  above. 

Questions  84-86:  Ability  to  govern  the  dative,  infinitives,  and  deliberative 
object  (O  'about';  OTNOSITEL6NO  'concerning')  phrases  was  queried. 
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Questions  87-100:  These  locations  were  reserved  for  codes  obtained  from 
charts.  As  noted  in  the  introduction  to  this  section,  charts  were  not  used 
after  limited  initial  testing. 

Results  of  the  pilot  study 

A  total  of  about  2,  000  nouns  were  processed  in  the  course  of  the 
study.  As  expected,  considerable  difficulties  were  encountered  in  a  number 
of  areas.  For  instance,  many  of  the  component  questions  of  the  question¬ 
naire  were  subject  to  different  individual  int.  irpretations  and  it  was  difficult 
to  establish  uniformity.  Quite  often  several  meanings  of  the  same  word 
form  tended  to  becoirs  superimposed  or  one  another  despite  apparent  efforts 
to  the  contrary.  Although  extensive,  the  questionnaire  was  weak  in  distin¬ 
guishing  varieties  of  abstract  and  deverbal  nouns  of  the  type  UPRTJGOST6 
(’elasticity’)  or  PROXOJDENIE  ('passage').  Hence  it  appears  that  several 
specialized  questionnaires  would  have  to  be  developed. 

Because  of  these  difficulties,  several  methods  of  computer  and  man¬ 
ual  sorting  generaij.y  failed  to  group  together  items  which  might  be  expected 
to  be  encoded  identically.  Since  the  answers  contained  in  each  questionnaire, 
when  reduced  to  numerical  codes,  were  of  the  appropriate  form  (numerical 
vectors),  several  tempts  were  made  to  process  samples  of  nouns  with  the 
aid  of  pattern  recognition  programs  operating  on  such  data  which  were  devel¬ 
oped  for  another  purpose  within  IBM  Research.  The  use  of  these  programs 
required  that  an  arbitrary  number  of  groups  had  to  be  specified  in  advance. 

By  means  of  clustering  techniques  analogous  to  those  described  in  Casey  and 
Nagy  (1966;  95-6)  the  pattern  recognition  programs  divided  a  given  sample 
of  nouns  into  the  preassigned  number  of  groups.  Although  a  number  of  ques¬ 
tions  can  be  raised  concerning  the  validity  and  the  methodology  of  this  pro¬ 
cedure,  the  results  obtained  are  interesting.  Predictably,  sufficiently  close 
groups  or  portions  of  larger  groups  brought  together  by  the  pattern  recogni¬ 
tion  programs  reflected  quite  clearly  such  categories  as  animate  nouns  or 
mass  nouns  as,  for  example,  the  following;  GRAD  ('hail'),  KREMNI1  ('sili¬ 
con'),  KRiSTAJ.L  ('crystal'),  PLENKA  ('film'),  SERA  ('sulfur'),  and 
POROX  ('powder')  --  all  of  which  can  be  called  in  some  sense  "mass  nouns". 

One  example  will  serve  to  summarize  the  types  of  problems  encoun¬ 
tered  in  trying  to  achieve  uniformity  of  coding.  The  animate  nouns  in 
Table  11-12  below  brought  together  as  a  group  by  the  pattern  recognition 
program  differed  a  n  shown  v.*ith  regard  to  their  encoding  on  the  questionnaire. 
Many  of  the  divergent  answers  shown  in  Table  H-12  can  be  explained  by  legi¬ 
timate  differences  of  opinion,  others  are  outright  errors.  Whatever  the 
causes,  the  effect  on  processing  tie  data  by  a  computer  is  obvious. 

The  pilot  studyr  described  above  produced  few  immediately  usable  re¬ 
sults.  However,  it  provided  some  valuaole  insights  into  the  possibility  of 
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(I)  Can  Nx  be  the  subject  of  BOLEET 
_ _ 

(10)  Can  Nx  be  the  subject  of  POGIB 
_____  J^'gerished')? 

(II)  Can  Nx  be  modlfiec  by" VYSOjai" 
('tall')? 


(13)  Can  Nx  be  mof'Jied  by  KRYPNYi 

_ (/.great 'It _ _ _ 

(15)  Can  Nx  be  modified  by  MELKI1 


(24)  Can  Nx  be  modified  by  TIXI1 

( 'jjuiet* )  ?_  ______ __ _  __  __ 

Can  Nx  be  modified  by  GROMKI1 

_ £'loud£)  ? _ _ 

(42)  Can  Nx  in  the  genitive  follow 
S0ST04NIE  ('state_of')? 

(~45)  Can  Nx  combine  with  ISKLHCENIE 

_ ['e^jcpulflion^ £xclusiorJ),_ etc.  ? _ 

£46) _ Is_this^  a  £ubjfct jgenitiye?_  _  _ 

£47)_ _l*__thi8  an  obje ct _genitiye_?  _ 

(64)  Can  Nx  in  the  genitivi  follow 

FREDSTAVTTEL6  ('representative 


(65)  Can  Nx  in  the  genitive  follow 

NACAL6NIK  (‘head  );  DIREKIOR 
('director'),  etc.  1 
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using  such  methods  in  the  future.  Some  short-range  prospects  are  de¬ 
scribed  immediately  below. 

Some  Interim  proposals 

A  comprehensive  classification  of  nouns  suitable  for  syntactic  analysis 
of  Russian  is  not  a  likely  prospect  for  the  near  future.  It  seems  that  in  ex¬ 
perimental  research  on  Russian  grammar  one  should  concentrate  on  specific 
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syntactic  and  semantic  features  of  nouns.  Such  features  can  be  gradually 
introduced  into  the  recognition  system  which  would  provide  a  vehicle  for 
testing  them  and  evaluating  their  effectiveness.  In  addition  to  features  pro¬ 
posed  in  this  report  for  the  recognition  of  appositive  constructions  (cf.  A4 
of  Section  2.  1.  4),  the  following  seem  to  merit  further  consic.eration. 

a.  Animate  ness.  Formally,  animateness  is  reflected  in  identity  of  form 
of  the  genitive  and  accusative  case  of  certain  nouns  (the  notable  ex¬ 
ceptions  are  feminine  nouns  of  the  fir3t  declension  in  -A  (RUKA  'hand') 
and  those  of  the  MAT6  ('mother')  type).  As  suggested  in  the  discussion 
of  questions  1-10  of  the  questionnaire,  however,  it  is  necessary  to 
consider  a  category  of  animate  agents  which  can  act  as  animate  agents 
of  certain  verbs  although  formally  inanimate.  For  instance,  KOLXOZ 
SLUWAL  .  .  .  (’The  collective  farm  listened  to  ...'). 

b.  Personification.  Ability  of  a  noun  to  act  as  an  animate  agent  should  per¬ 
haps  be  treated  as  a  part  of  the  broader  problem  of  personification;  4 
i.e.,  the  ability  of  a:i  abstract  noun  to  act  as  a  concrete  agent:  $EGO 
SLOYA  VREZALIS6  V  PAM4T6  (literally:  'His  words  plowed  into  the 
memory1).  Related  to  but  not  necessarily  identical  with  personification 
are  problems  of  "concretizs '  on"  of  abstract  nouns  as  evident  d  from 
suck  factors  as  countability  and  pluraUzability.  For  instance,  bOKRA- 
5ENIE  RASXODOV  ('curtailment  of  expenditures'),  but  DVADQAT6 
WEST6  SOKRA5ENI1  ('twenty-six  abbreviations'). 

c.  Substantivization.  Although  substantivization  (ability  of  a  norphological 
part  of  speech  other  than  a  noun  or  of  other  constructions  to  function 
syntactically  as  a  noun)is  not  normally  considered  a  part  of  noun  sub- 
classification,  such  substantivize  sterns  exhibit  a  number  of  peculiar 
features.  For  instance,  the  ability  to  combine  with  adverbs  (T4JELO 
B0L6N01  'seriously  ill  patient')  or  the  ability  to  be  modified  by  ad¬ 
jectives  (BESKONECHNYE  "NEL6Z^"  VEE 'ALIS6  V  FAM4Tt  The 
endless  "you  can't  do  that'"s  plov  ed  into  the  memory'). 

d.  Deverbal  nouns.  Since  deverba?  nouns,  typically  those  having  the 
-ENIE,  -KA,  and  zero-affix  fVZRYV  'explosion'),  duplicate  many  of 
the  characteristics  of  the  verb,  they  should  be  identified  as  such.  Thit» 
feature  is  important  in  a  v?j.ieiy  of  constructions  but  especially  in  pre¬ 
position-noun  phrases  and  instances  when  a  devcrbal  noun  governs  a 
noun  in  the  genitive.  In  prt position-noun  phrases,  the  different  syn 
tactic  function  of  the  construction  fo:  normal  and  deverbal  nouns  is  t-  in¬ 
gested  by  the  contrasts  in  translation:  ON  PRIWE.L  V_  KQMNATU  /he 
came  into  the  room'),  versus  ON  PRIWEL  V  VOL.NENTE  ('he  bream* 
excited);  ON  JIL  PRI  BOL&NIQE  ('he  lived  a*  the  hospital'),  versus 
ON  POSTRADAL  PRI  PQ6ADKK  ('he  suffered  injury  in  Unding'),  1- 

ir stances  when  the  deverbal  noun  governs  a  noun  in  thv  genitive,  obj* 
genitive  and  subject  genitive  constructions  are  distinguished  on  the  hi-  u> 
of  the  underlying  verb.  Thu*  if  KRICAT6  {'to  scream')  is  an  ve 


verb  ,  KRIK  KOMANDIRA  ('commander's  scream')  can  only  be  a  sub¬ 
ject  genitive;  if  NERVNICAT6  (’to  be  nervous')  can  only  have  animate 
subjects,  it  is  possible  to  encounter  NERVNICANIE  JENY  ('wife's 
nervousness’),  but  not  *NERVNICANIE  STULA  (*'nervousness  of  the 
chair'). Within  the  group  of  deverbal  nouns  one  may  further  distin¬ 
guish  between  those  that  preserve  the  verbal  characteristics  and  those 
that  remain  deverbal  in  derivation  only:  PROVODY  ('seeing  off  (of 
someone)')  versus  PROVODA  ('wires')  or  SOKRA5ENIE  ('(process  of) 
curtailment/ contraction')  versus  SOKRA5ENIE  ('abbreviation/abbrevi¬ 
ated  word'). 


e.  Action  nouns.  A  number  of  nouns  denoting  various  actions  are  not  de¬ 
verbal  in  the  strict  morphological  sense  as  those  mentioned  above 
(SPIN  'spin',  REAKQI4  'reaction')  or  their  morphological  relationship 
to  a  verb  is  obscure  (RABOTA  'work').  Such  nouns  share  certain  prop¬ 
erties  with  deverbal  nouns  but  require  additional  study. 

f.  Adverbial  function.  Many  nouns,  especially  in  the  accusative  and  in¬ 
strumental  (cf.  discussion  of  questions  74  and  75  of  the  questionnaire), 
acquire  adverbial  functions:  BOLVANKA  VESIT  TONNU  ('the  ingot 
weighs  a  ton1). 


The  list  of  such  features  can  be  extended.  Some  have  been  suggested 
elsewhere,^  others  can  be  easily  derived  in  the  course  of  research  once  it 
is  undertaken.  In  most  instances,  however,  such  a  classification  will  yield 
little  without  a  corresponding  parallel  analysis  of  other  parts  of  speech  -- 
in  particular,  verbs. 


2.  2.  2  Verbs 


A  limited  study  of  verbs  was  undertaken,  utilizing  a  portion  of  the 
classificatory  criteria  described  in  Andreyewsky  (1965).  Some  1800  verbs 
contained  in  a  special  brochure  published  by  the  USSR  Academy  of  Sciences 
(Demidova  et  al.,  1963)  were  analyzed  jointly  at  IBM  Resear  ch  and  at  the 
Library  of  Congress.  As  with  the  nouns,  the  results  of  the  verb  classifica¬ 
tion  were  processed  by  the  pattern  recognition  programs. 


The  test 


The  verbs  were  coded  according  to  their  ability  to  combine  with 
selected  prepositional  phrases,  certain  adverbs,  and  the  CTO-introduced 
object  clauses.  The  objective  was  to  test  not  only  the  ability  of  a  given  verb 
to  co-occur  with  certain  types  of  phrases  or  classes  of  adverbs,  but  also  to 
trace  what  effect,  if  any,  the  verb  has  on  their  syntactic  function.  Partici¬ 
pants  in  the  test  were  presented  with  a  list  of  potential  verbal  environments 
(Figure  11-3)  and  asked  to  indicate  whether  or  not  the  verb  could  occur  in 
them  in  one  or  more  of  the  indicated  senses.  The  particular  nouns  and 


Figure  II-J:  TEST  ENVIRONMENTS  OF  VERBS 


1) 

...  DO  MEN4 

(A)  before  me 

(B)  as  far  as  me 

2) 

...  DO  RASSVETA 

(A)  before  dawn 

(B)  until  dawn 

3) 

.  .  .IZ-ZA  STOLA 

(A)  because  of  the  table 

(B)  from  behind  the  table 

4) 

. . .  K  MITINGU 

(A)  for  the  meeting 

(B)  to  the  meeting 

5) 

.  .  .  K  NAM 
{A)  to  us 
(B)  toward  us 

b) 

.  . .  ZA  OB EDOM 

(A)  after  (to  get)  dinner 

(B)  during  dinner 

7) 

...U  $ZINY 
(A)  at  Z ina's 
(B"5  x r o  n  T  ina 

8) 

...POD  KAPUSTU 

(A)  for  cabbage 

(B)  under  cabbage 

9) 

.  .  .  ZA  STOL 

(A)  at  the  table 

(B)  behind  tne  table 

10) 

.  . . ZA  BRATA 

(A)  in  brother's  place 

(B)  for  brother's  sake 

11) 

.  .  .  PO  OWIBKZ 

(A)  a  mistake  apiece 

(B)  by  mistake 

12) 

.  . .  45IK  IZ-POD  UGL4 

(A)  coal  crate 

(B)  crate  from  under  the 
coal 

13) 

.  .  .  O  STOL  * 
against  the  table 

14) 

.  . .  PO  VODU  * 
to  get  water 

15) 

. . .  CTO  NAPIWET  * 
that  +  (subject)  +  will 
write 

16) 

.  , . NADVOE  * 
in  two  (as  in  cutting) 

17) 

...  OCEN6  * 
very  much 

18) 

...  0  SESTRE  * 
about  the  sister 

*  only  test  ability  to  combine  in  the  meaning  indicated 


pronouns  appearing  as  preposition  complements  in  the  sample  environments 
were  regarded  ae  representatives  of  classes  rather  than  in  terms  of  their 
precise  lexical  content,  e.g.,  RASSVETA  ('dawn*}  in  item  {2}  of  Figure  II- 3 
stood  for  the  class  of  ''event  nouns". 

Results  of  the  test 

An  analysis  of  the  results  obtained  from  thin  study  of  verbs  suggests 
that  although  the  criteria  used  can  improve  the  recognition  of  verb-governed 
preposition-noun  phrases,  a  much  more  comprehensive  study  of  verbs  will 
have  to  be  undertaken  in  tire  future.  Some  of  the  questions  can  be  formulated 
much  more  clearly  by  substituting  interrogative  adverbs  or  other  preposi¬ 
tional  constructions.  For  instance,  3A  and  3B  can  be  replaced  by  POCEMU 
('why')  and  OTKUDA  ('where. .  .  from'),  'respectively. 

Possible  extensions  ol  verb  classification 

The  limited  study  of  verb  classification  conducted  during  the  contract 
period  represented  an  attempt  to  deal  with  a  number  of  ti  e  so-called  Aktion- 
sart  distinctions,  implicit  in  the  ability  o_  verbs  to  combine  with  various  ad¬ 
verbial  s.  Briefly,  this  view  holds  that  while  a  pair  of  verbs  like  PROCITAT6 
(*to  read',  perfective)  and  FROCITYVAT6  ('to  read  ,  imperfective)  is  an  as¬ 
pectual  pair,  the  distinction  in  aroeef  is  secondary  in  che  case  of  a  pair  such 
as  CITAT6  ('to  read1,  imperfective)  and  PROCITAT6  ('to  read',  perfective); 
instead,  one  deals  with.  Ah-ticnaa^t,  a  *  manner  of  action"  distinction,  because 
the  latter  verb  form  describes  a  particular  way  che  process  cf  reading  tran¬ 
spired,  as  reflected  in  the  possible  English  equivalent  'to  read  through'. ^ 
Since  most  Aktionsart  distinctions  are  related  to  various  affixed  forma  of  the 
basic  verb,  affixation  of  the  Russian  verb  and  Aktionsart  must  be  studied 
concurrently. 

Aspect  and  Aktlonr  ai t  are  important  for  the  recognition  of  construc¬ 
tions  consisting  of  verbs  and  adverbial s.  Thus,  VSEGDA  ('always')  only 
rarely  occurs  withperfective  verbs  (*VSEGDA  PROCITAL  'always  read 
through'),  while  the  atter.uative  Aktionsart  of  the  verb  e.g.,  POPZABYL 
('(partially)  forgot')  --  cannot  comb  in.,  with  adrer;bt  like  C  KGNCATEL6NO 
('completely,  finally1).  The  prefix  of  a  verb  is  alro  important  in  certain  in¬ 
stances  of  verb  government  of  preposition-noun  phrases,  e.g.,  PC  DOW  EL 
K  . . .  ('walked  up  to  . . .  ')  or  VNES  _V  . . .  (’  carried  inf-)  ...').  Some  re¬ 
strictions  on  the  type  of  noun  whtcb  can  appear  in  such  constructions  must 
be  workel  out  --  for  example,  NAPISAL  NA  P-PMAGE  (’wrote  on  paper'), 
but  NAPISAL  NA  RADQST6  STRANE  ('wrote  to  the  delight  of  the  country'). 

The  need  to  develop  selection  restrictions  for  possible  subjects  and 
objects  of  a  given  verb  has  been  mentioned  in  conjunction  with  noun  classifi¬ 
cation.  In  addition,  it  is  necessary  to  establish  \  more  basic  distinction: 


the  ability -inability  of  s.  given  verb  or  its  forms  to  appear  in  personal  and 
impersonal  sentences.  Thus,  NERVNICAT6  ('to  be  nervous')  and  SVETAET 
('it  is  getting  light')  are  examples  of  a  personal  and  an  impersonal  verb, 
respectively;  KAZAJLOS6  in  NAM  KAZAL1OS6  ('it  appeared  to  us')  and  PUGA- 
LG  in  NAS  PUGALO  (*we  were  afraid')  are  impersonal  forms  of  personal 
verbs.  Ihe  semantic  content  of  the  subject  is  sometimes  important  in  estab¬ 
lishing  the  voice  of  a  predication  containing  a  reflexive  verb  as  shown,  for 
example,  by  the  contrast  between  DETI  MOHTS4  {‘'"hi.ldren  wash  them¬ 
selves*)  and  REDISKA  MOETS4  ('radishes  are  washed'). 

Although  certain  features  of  verbal  government  are  encoded  in  the 
Russian  Master  Dictionary  (RMD),  additional  Improvements  are  necessary. 
These  include  (a)  distinctions  of  the  sore  discussed  above  under  classifica¬ 
tion  of  nouns  and  (b)  information  about  the  ability  of  a  transitive  verb  to 
lunction  intransitively  (in  its  "absolute  form"). 

Yet  another  area  requiring  study  is  treated  in  most  Soviet  grammars 
under  the  heading  of  "compound  predicates"  (sostavnye  skazuemye).^ 

Classes  of  complements  have  to  be  define  id  and  the  ability  of  a  verb  to  com¬ 
bine  with' each  class  studied.  Thus,  it  seems  that  almost  all  Russian  verbs 
can  have  a  full-form  adjectival  complement  in  the  instrumental  or  the  nomi¬ 
native  case:  UMER  MOLODYM  ('died  young1).  Seme  verbs  can,  in  addition, 
have  substantive  complements  (KAZAJLS4  GENERA1.GM  'appeared  to  be  a 
general')  and,  less  frequently,  impersonal  predicates  and  other  comple¬ 
ments:  STALO  OCEVIDNO  ('it  became  apparent’).  Only  the  copula  BYT6 
('to  be')  can  freely  combine  with  all  of  the  complement  types. 

The  above  discussion  presents  a  summary  of  various  areas  which 
will  have  to  be  examined  preparatory  to  undertaking  a  verb  classification. 
Although  a  comprehensive  classification  of  verbs  may  be  a  long  way  off,  as 
suggested  in  the  case  of  noun  classification,  certain  of  the  features  that  can 
be  expected  to  enter  into  any  such  scheme  --  among  them  items  discussed 
here  --  can  profitably  be  studied  individually. 

2.  2.  3  Adverbs 

Based  on  materials  contained  in  Prokopovich  (1962),  an  effort  was 
made  to  classify  adverbial  entries  in  the  Russian  Master  Dictionary  (RMD) 
according  to  their  ability  to  combine  with  verbs,  nouns,  and  adjectives.  Two 
factor?  however,  led  to  the  abandonment  of  this  study  as  impractical:  first, 
the  adverbial  entries  in  the  RMD  --  class  (R7)N  --  were  found  to  include 
much  linguistically  unrelated  material  (words  and  phrases  coded  an  "adverbs" 
in  order  to  preclude  their  analysis  in  RMD  applications);  second,  the 
bulk  of  the  true  adverbs  were  not  in  (B7)N  at  all,  since  they  were  handled  in 
the  RMD  as  derived  from  adjectival  stems  or  by  means  of  other  routines. 


The  nqed  for  such  a  classification  of  adverbs  is  pressing,  however, 
because  little  information  is  available  in  Soviet  grammars.  Some  comments 
about  verb-adverb  constructions  appear  in  the  immediately  preceding  dis¬ 
cussion  of  verb  classification.  Adverb-noun  and  adverb-adjective  construc¬ 
tions  were  not  studied  in  any  degree  of  detail. 

2.3  Related  language  processing  activities 

As  part  of  the  statistically -oriented  side  of  the  grammatical  research 
carried  out  during  the  contract  period,  the  RAND  corpus  of  Air  Force  Rus¬ 
sian  texts  was  partially  processed  for  the  purpose  of  obtaining  statistical 
information  on  lexical  frequency  --  information  which  could  serve  both  to 
measure  the  coverage  of  the  Russian  Master  Dictionary  (RMD)  and  as  a 
guide  to  future  lexical  research.  In  addition,  five  updates  of  the  RMD  were 
carried  out  in  order  to  incorporate  the  results  of  the  work  performed  by  the 
Library  of  Congress  lexicographers. 

Processing  of  the  RAND  Russian  text  corpus 

About  five  million  words  of  Russian  text  transcribed  onto  paper  tape 
by  the  Air  Force  and  later  transferred  to  magnetic  tape  by  the  RAND  Corpo¬ 
ration  were  partially  processed  during  the  contract  period  with  a  view  to¬ 
wards  obtaining  paradigm  frequency  statistics.  A  large  part  of  the  consid¬ 
erable  amount  of  programming  and  processing  involved  in  such  an  under¬ 
taking  was  completed,  but  further  work  had  to  be  stopped  at  the  end  of  the 
contract  period,  short  of  producing  final  results. 

Initial  difficulties  involved  rather  tedious  problems  of  code  conversion 
and  text  editing.  Once  the  text  was  listed  in  human-readable  form,  it  was 
found  that  some  30  per  cent  of  it  was  unusable  due  to  such  factors  as  garbling 
and  reversal  of  paper  tapes,  thus  reducing  the  total  corpus  to  some  three  and 
one-half  million  Russian  word  form  occurrences.  When  sorted  and  counted, 
this  corpus  was  found  to  contain  about  100,  000  unique  word  forms. 

The  next  objective  was  to  map  word  forms  automatically  onto  their 
respective  RMD  stems  in  order  to  obtain  a  first  appx  jximation  to  paradigm 
frequency  statistics.  Since  no  appropriate  program  existed  for  matching 
Russian  text  words  against  stems  in  the  RMD  on  a  general-purpose  computer, 
n  new  dictionary  lookup  program  had  to  be  written.  The  preparation  of  this 
program  involved  considerable  effort  in  construction  of  usable  tables  of  Rus- 
rian  endings  from  the  original  grammatical  classification  charts  employed  in 
preparing  the  RMD  entries.  Difficulties  were  encountered  in  processing 
multiple-stem  (compound)  Russian  words. 

Preliminary  tests  indicated  that  about  95  per  cent  of  the  word  form 
occurrences  in  the  corpus  could  be  matched  against  the  1MD.  The  remaining 
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|  5  per  cent  represented  either  words  not  in  the  dictionary  or  misspelled 

I  words,  which  are  rather  numerous  in  this  corpus.  Some  25,  000  dictionary 

I  stems  were  needed  to  match  those  word  forms  that  could  be  matched  at  all. 

This  is  about  a  quarter  of  the  total  number  of  single-word  stems  in  the  dic¬ 
tionary  (phrase  entries  were  ignored  in  this  test).  Only  about  40  per  cent 
1  of  the  25,  000  stems  were  matched  against  words  with  total  frequency  of  oc- 

H  currence  greater  than  one  in  the  corpus.  Unfortunately,  time  did  not  per- 

|  mit  pursuit  of  the  investigation  beyond  the  obtainment  of  these  somewhat 

suggestive,  but  necessarily  inconclusive,  results. 

1  Russian  Master  Dictionary  updates 


The  bulk  of  the  improvements  incorporated  into  the  RMD  by  the 
Library  of  Congress  personnel  between  November  of  1964  and  August  of 
1965,  together  with  other  changes,  additions,  and  deletions  subsequent  to 
the  latter  date  were  processed  during  the  contract  period.  The  changes  in 
the  RMD  between  August  1965  and  July  1966  are  reflected  in  Table  11-13. 
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Table  11-13: 

Size  of  the  Russian  Master 

Dictionary 

RMD 

Total  number 

Version 

of  entries 

Date 

181 

165,  202 

8.  19.  1965 

182 

unavailable 

11.  13.  1965 

183 

161, 189 

1.  20. 1966 

184 

152,  541 

5.  04.  1966 

185 

135,  518 

7.01.  1966 

186 

135,  225 

7.  27.  1966 

The  size  of  respective  updates  is  shown  in  Table  H-14. 
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Table  11-14: 

Size  of  the  Russian  Master  Dictionary  Updatei 

]  RMD 

New 

Total  adds  and 

Version 

Updated  on 

Version 

deletes  in  updah 

181 

11.  13.  1965 

482 

10,  244 

1  182 

1.  20. 1966 

133 

26,  318 

j  183 

5.04. 1966 

184 

19,092 

i  184 

7.  01.  1966 

485 

17,  0?.3 

:j  185 

7.  26. 1966 

186 

503 

total  adds  and  deletes  in JRMD  updating  73,180 


NOTES 


1.  Sentential  punctuation  was  not  studied  in  detail.  In  declarative  sen¬ 
tences,  this  is  usually  the  period  at  the  end  of  the  sentence.  However, 
in  certain  instances  not  considered  in  the  present  study  (direct  speech, 
for  example),  punctuation  may  appear  on  both  sides  of  the  sentence. 

2.  For  actual  word  forms,  the  traditional  parts  of  speech  were  used. 
These  classes  are  not  discussed  as  a  group  in  this  report,  but  indi¬ 
vidual  parts  of  speech  are  described  where  pertinent  to  the  discussion 
of  particular  sets  of  rules. 

3.  In  Russian,  the  function  words  are:  conjunctions,  prepositions,  parti¬ 
cles,  and  interjections;  ail  other  part-of-spe&ck  classes  are  lexical 
words  and  include:  nouns,  verbs,  adjectives,  numerals,  adverbs, 
pronouns,  and  words  of  the  category  of  state  ("forms  in ’-O'").  For 
English,  see,e.g.,  Roberts  (1954;  15-24). 

4.  Appositive  ties  are  illustrated  by  constructions  discussed  in  Part  A4 
of  Section  2.  1.4.  See  also  note  29. 

5.  The  term  slovosochentanie  is  used  here  in  the  sense  defined  in  V.  V. 
Vinogradov,  ed. ,  Gramma tika  russkogo  iazyka  (Grammar  of  the  Rus¬ 
sian  Language)  (Vol.  II.  1,  pp.  10-62),  henceforth  referred  to  as  the 
"Academy  Grammar". 

6.  Such  constituents  are  the  subject  of  a  monograph  by  Cheshko  (I960), 
but  were  not  investigated  as  part  of  the  present  study.  Occasional 
examples  encountered  in  tine  160-sentence  sample  of  Pravda  editorials 
were  allowed  to  he  absorbed  by  constituents  right-adjacent  to  them. 

7.  Coordinative  and  eubordinative  conjunctions  are  not  specifically  enu¬ 
merated.  See  Part  C  of  Section  2.  1.  4  and  note  66  for  coordinative 
conjunctions.  For  further  comments  about  subordinative  conjunctions 
see  Part  D  of  Section  2.1.4. 

3.  This  term  is  used  iii  the  iienne  defined  by  Nida  (1960;  59).  The  extent 
to  which  strings  of  cardinal  numerals  cei?  be  called  accumulative  is 
open  to  question.  However,  there  arc  sufficient  formal  similarities 
to  justify  such  usage  in  this  report. 

9.  This  restriction  refers  only  tc»  v/hat,  according  to  Gleason  (1961;  132), 
would  be  called  immsdlata  constituents  of  such  constructions. 

10.  This  is  a  temporary  restriction  which  was  intended  to  limit  the  scope 
of  initial  investigations.  Exclusion  of  strings  of  characters  other  than 
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Russian  is  intended  to  avoid  temporarily  the  need  to  consider  such 
items  as  formulas,  equations,  Latin  names,  etc.,  since  these  have  to 
be  treated  as  part  of  the  overall  problem  of  substantivization  which 
could  not  be  studied  for  lack  of  time. 

The  effects  of  exhaustive  automatic  application  of  essentially  context- 
free  phrase-structure  grammar  rules  requiring  that  each  structure 
recognized  by  the  grammar  be  analyzed  into  two  immediate  constitu¬ 
ents  have  been  described  by  Robinson  (1966).  What  she  described  as 
the  problem  of  overstructuring  of  exidocentric  constructions  and  the 
"doubtful  propriety  of  permitting  more  than  one  way  of  structuring" 
is  illustrated  for  Russian  by  the  example  STARYE  STENY  GORODA 
(•old  walls  of  the  city1). 


12.  Instances  of  the  latter  type  are  illustrated  by  the  contrasting  interpre¬ 
tations  of  the  sentence  $ON  KUPIL  DOM  V  $LONDONE  (’He  bought  a 
house  in  London1). 


13.  Despite  the  importance  of  word  order  in  surface  structure  recognition, 
the  available  Information  on  Russian  is  fragmentary  and  much  empha¬ 
sis  is  placed  on  semantic  distinctions  which  would  effectively  require 
discourse  analysis  capabilities.  Two  recent  monographs  (Sirotinina 
(1966)  and  Schaller  (1966))  based  on  an  extensive  manually  processed 
corpus  contain  interesting  insights.  Attempts  to  provide  a  theory  of 
word  order  are  reflected  in  Kholodovich  (1966).  During  1966  a  num¬ 
ber  of  articles  in  Russkii  iazyk  v/shkole  dealt  with  the  "semantic 
parsing"  (aktual'noc  chlenentc)  of  the  sentence  proposed  by  Mathesius 
in  1947,  Chapters  on  word  order  are  given  in  all  the  standard  refer¬ 
ences  used  in  the  present  study  (see  note  58).  Gorbachtk  (1964)  con¬ 
tains  an  interesting  summary  of  the  conventional  statements  about 
word  order  in  Russian  grammar. 


14.  The  example  is  borrowed  from  Gvozdev  (1961;  17).  Another  illustra¬ 
tion  is  given  at  the  end  of  the  discussion  of  subjectless  predications 
in  Part  B  of  Section  2.  1.  4. 


15.  This  problem  is  disregarded  in  subsequent  illustrations  given  in  this 
report.  For  instance,  i 1  some  constituent  B  and  constituent  C,  agree 
tng  in  number  only,  produce  a  constituent  D,  then  the  hypothetical 
rules  would  have  to  be  of  the  form 

(a)  B  NG/X-P  4  C  NG/X-P  »  D  NG/X 

(b)  B  NO/P  4  C  NG/P  »  D  NG/P 

Note  that  the  value  of  the  NO  a’trtbute  in  (a)  is  set  to  "any  except 
plural"  and  in  (bj  it  is  only  plural. 


16.  One  of  the  few  instances  where  gender  distinctions  are  significant  in 
the  plural  is  in  dealing  with  adjective -noun  agreement  affected  by 
cardinal  numerals,  where  it  is  important  to  know  the  gender  of  the 
noun  (d.  Part  A8  of  Section  2,  1.  4).  Other  instances  occur  in  cases 
of  agreement  between  coordinated  adjectives  and  nouns. 

17.  In  part  responsible  for  the  choice  were  the  subjective  preferences  of 
those  involved  with  the  development  of  RG2;  it  appeared  that  an  initial 
swelling  in  the  number  of  experimental  rules  was  preferable  to  a  great 
number  of  tags,  a  situation  which  created  considerable  difficulties  in 
RG1.  With  the  availability  of  ordering  of  subrules  as  an  option,  the 
desired  "transparency"  of  rules  can  be  accomplished  more  economi¬ 
cally  than  could  be  done  for  RG2  rules  without  employing  the  device  of 
constituent  renaming. 

18.  Short-fcrm  adjectives  and  short-form  participles  (SF)  ran  function 
only  predicatively,  and  hence  are  not  considered  here. 

19.  In  the  course  of  linguistic  research  carried  out  under  the  present  con¬ 
tract,  a  variety  of  adjectival  strings  have  been  identified.  However, 
they  are  not  discussed  in  this  report  because  of  space  limitations. 

Many  of  these  strings  result  from  coordination  and  can  have  the  type 
of  structure  described  for  such  compounds  (2.  1.  4C).  Some  of  the 
strings  of  asyndetic  form  have  a  variety  of  meanings  which  are  either 
rare  in  expository  writing  (cf.,  e.g.,  repetition  for  emphasis 
VYSOXli,  VYSOKI1  DOM  (’a  very  tall  house'))  or  cannot  at  present 
b*  re c  fgrized,  e.g.,  NOVA4,  LUCWA4  JIZN6  ('a  new  (i.e.,)  better 
life'), 

20.  Ordinal  numerals,  which  some  authors  (e.g.,  Vinogradov  (1947; 

233-6})  consider  "ordinal  adjectives",  are  treated  as  a  separate  con¬ 
stituent  class  R.  The  decision  to  follow  the  prevailing  viewpoint  was 
motivated  by  peculiarities  of  ordinal  numeral-cardinal  numeral  agree¬ 
ment  and  the  use  of  ordinal  numerals  in  fraction#.  Both  topics  were 
studied  during  the  contract  period,  but  are  not  further  discussed  in 
this  report. 

In  addition  to  distinguishing  .e iween  active  and  passive  participles, 
special  consideration  should  have  been  giver,  to  reflexive  participial 
forms,  because  their  syntactic  function  approaches  that  of  passive 
participles  although  their  morphological  properties  are  those  of  active 
participles. 

21.  In  oblique  case*;  (all  cases  except  nominative  and  accusative),  all  nu¬ 
merals,  except  SC/SI,  SC/T,  and  SC/M  when  they  act  as  nouns,  are 
is  the  same  case  as  the  noun,  which  mui  .‘  be  plural,  SC/Si  numerals 
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agree  in  all  cases  in  case,  number  and  gender.  SC /M  and  SC/T  nu¬ 
merals  govern  the  noun  in  the  genitive  plural  irrespective  of  the  case 
they  themselves  are  in. 

22.  Since  tag  values  cannot  start  with  a  numeral  but  must  begin  with  an 
alphabetic  character,  the  letter  "S"  was  arbitrarily  chosen  here  as 
a  prefix  to  "l",  ''2",,  etc. 

23.  These  statements  apply  to  all  cardinal  numeral*,  whether  spelled  out 
in  full  or  represented  by  Arabic  numerals. 

24.  The  fact  that  these  numerals  are  really  nouns  from  a  morphological 
standpoint  is  demonstrated  by  their  number  and  gender  distinction*?. 

25.  In  most  modern  Russian  prosejdenominationa  greater  than  a  billion 
are  seldom  spelled  out  and  are  accordingly  not  considered  b  re. 

26.  Such  other  purposes  include  employment  in  the  rules  necessary  for 
adjective-noun  agreement  affected  by  numerals,  a  topic  which  is  dis¬ 
cussed  briefly  in  A8  of  the  present  section, 

27.  Detailed  information  about  punctuation  of  such  predications  has  been 
collected  from  various  sources  but  is  not  included  for  reasons  of 
space  limitations. 

28.  The  use  of  this  term  has  been  checked,  among  others,  in  Jespersen 
(1964,  1965)  and  Roberts  (1954).  So-called  appositive  adjectives  (a 
man,  lean  and  hungry,  walked  in)  are  usually  referred  to  in  Russian 
grammars  as  "detached"  (obosoblennye)  except  for  TJnbegaun  (1957; 
300).  1’he  "appositive  genitive"  (the  city  of  San  Francisco)  and  "ap¬ 
positive  clauses"  introduced  by  subordinating  conjunctions  (This  does 
not  explain  the  fact  that  he  knew  where  to  f^td  it)  are  not  called  appo- 
oitiye  in  Russian.  The  English  examples  used  in  these  illustrations 
are  from  Roberts  (1954;  467-8).  The  sense  in  which  the  term  is  used 
in  this  report  generally  follows  Rudnev  (1959;  30-43). 

29.  In  addition  to  the  cl****  appositions  described  below,  six  other  types  of 
appositions  were  studied  during  the  contract  period,  but  are  not  dis¬ 
cussed  at  this  time.  The  s'.x  types  are  illustrated  by  the  following 
examples: 

(1)  Hyphenated  appositions:  JEN5I1V' -VRAC  (a  woman  doctor ') 

(2)  Api  •Uions  where  the  appositive  is  enclosed  in  quotation  marks: 

GAZETA  "$PRAVDA"  (  the  newspapei  Pravda1) 

(3)  Appositions  where  the  appositive  is  enclosed  ’"Hthin  parentheses: 

NEGUS  (KOROL6,  ('Nogus  (a  king)') 
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(4)  Detached  appositive  construction#:  $PRAVDA,  ODNA  IZ 
KRUPNE1WIX  GAZET  V  $S$S$S$R,  . . .  ('Pravda,  one  of  the 
largest  newspapers  in  the  USSR,  . . .  ') 

(5)  Detached  appositive  constructions  where  the  appositive  is  in¬ 
troduced  by  a  special  conjunctive  word:  $TOLSTd,  KAK 
PISATEL6,  GENIALEN  {'Tolstoy  as  a  writer  is  a  genius') 

(6)  Miscellaneous  mixed  types  as,  for  instance,  LETCIK- 
ISPYTATEL6  $NEUDACNlKOV  ('test  pilot  Neudachnikov'). 

30.  Animateness  is  based  on  grammatical  criteria.  For  further  details, 
see  the  discussion  of  subclassification  of  nouns  in  Section  2.  2.  i. 

31.  "Humans",  as  distinguished  from  non-humans  (cf.  Greek  alogos),  are 
real  or  imaginary  animate  beings  capable  of  speaking.  To  designate 
all  animate  non-humans  as  animals  is  obviously  imprecise,  but  con¬ 
venient  for  present  purposes. 

32.  Epithet  is  used  as  an  equivalent  of  the  Russian  prozvishche  or  proz- 
vanie.  Generally,  an  epithet  differs  from  a  name  in  its  indeclinability. 

33.  This  feature  is  intended  to  single  out  nouns  which  either  identify 
classes  of  terrestial  and  celestial  locations  or  names  of  such  locations. 
Thus  OZERO  ('lake'),  PLANETA  ('planet'),  $BA1KAL  ('Baikal'),  and 
some  others  are  "geographic"  nouns.  The  following  nouns  are  not: 
KANAVA  ('ditch'),  DNO  ('bottom'),  POLE  ('field'). 

34.  Botanical  nouns  identify  species  and  genera  of  plant  life  as  well  as 
names  of  their  members.  Latin  names  have  not  been  considered, 

35.  "Nomenclature  items"  include  various  product  designations  and  usually 
consist  of  an  abbreviation  and/or  numerals.  For  instance,  $D$T-54 
('DT-54'  --  a  diesel  tractor).  Agent  007  and  the  characters  in 
Zamyatin's  We  notwithstanding,  it  seems  unreasonable  to  introduce 
this  distinction  for  animate  ncu^s  in  Russian. 


36. 


This  restriction  specifically  refers  to  nouns  which  can  appear  as  con¬ 
stituents  in  appositions  discussed  in  this  report 

"Test  A",  for  adjective -relatedness,  applies  only  to  constituents  of  the 
types  shown  in  lines  7  and  6  of  Table  H-6,  which  both  have  the  property 
of  being  able  to  combine  as  C j  constituents  with  C2*s  which  are  "titles" 
(cf.  line  9  of  Table  II-6).  Adjective- related  C^'s  (iine  7)  must  always 
agree  in  gender  with  their  C£  in  such  constructions. 


38.  The  statement  should  read  in  full:  "Is  the  noun  morphologically  or 
semantically  related  to  an  adjective 7"  For  example  ETARIK  ('old 
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man')  -  STARYi  {'old'),  BOGAC  ('rich  person')  -  BOGATY1  ('rich').  . . 


39.  Genera -species  distinctions  implied  in  the  tests  are  intended  to  distin¬ 
guish  nouns  so  related:  RYBA  ('fish')  and  AKULA  ('shark');  QVETOK 
{'flower')  and  LILI4  ('lily');  GAZ  {'gas')  and  BUT  AN  ('butane'),  etc. 
Taking  an  extreme  position,  completely  satisfactory  recognition  of 
appositions  would  require  encyclopedic  information  and  discourse 
analysis  capability.  Thus,  in  a  sentence  like  $QAR6  $4MOMOTO 
FRIKAZAL  KAZNIT6,  two  analyses  are  possible. 

(a)  Tsar  Yamomoto  ordered  an  execution. 

(b)  The  Tsar  ordered  Yamomoto  to  be  executed. 

To  avoid  the  former  analysis,  it  would  be  necesscry  to  appropriately 
tag  the  words  QAR6  ('tsar')  and  $4MOMOTO  ('Yamomoto')  in  order  to 
prevent  the  combination  'Tsar  Yamomoto'  since  of  all  the  tsars  none 
was  named  Yamomoto.  This  type  of  problem  affects  all  appositions 
discussed  in  this  report  with  various  degrees  ;£  severity,  especially 
in  instances  where  genera-species  distinctions  are  concerned. 

40.  Additional  refinements  may  be  required.  The  problem  of  how  to  treat 
foreign  names,  especially  Chinese  or  Arabic,  was  not  studied  in  detail. 

41.  The  number  of  nouns  which  "usually"  do  not  function  as  members  of 
close  appositions  is  large.  However,  the  examples  listed  nearly  ex¬ 
haust  the  instances  where  such  function  is  extremely  unlikely. 

42.  The  first  name -patronymic  construction  can  be  in  apposition  to  the 
last  name  when  the  last  name  has  another  appositive:  INJENER 
$PETROV,  $IVAN  $IVANQVTC  ('engineer  Petrov,  Ivan  Ivanovich'). 

43.  Any  member  of  an  apposition  can  be  a  compound  constituent  which 

can  be  formed  according  to  models  sketched  1.  4C).  However, 

it  is  the  peculiarity  of  this  type  of  apposition  that  when  several  titles 
qualify  a  person  "in  different  planes"  (Bylinskii  and  Rozentai'  (1959; 
30))  such  titles  can  form  an  accumulative  string.  For  instance, 
KOMSOMOLEQ  INJENER  $IYANOY  ('komeornoi  member,  engineer 
Ivanov').  See  also  note  65. 

44.  The  apposition  recognized  by  this  rule  has  agreement  peculiarities 
involving  i..e  gender  of  adjectives  and,  where  applicable,  if  verbs. 
Several  articles,  of  which  Protchenko  (1961)  is  a  fair  example,  have 
appeared  about  this  problem.  Rozental’  (1965;  243)  suggests  a  set  of 
rules  which  are  sufficiently  comprehensive  to  eliminate  the  need  for 
further  discuss  to.,  of  this  topic  here. 
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45.  The  two  options  are:  (a)  agreement  in  case  between  the  common  noun 
and  the  proper  name  (POD  GC&ODOM  $XALUGOi  ’near  the  city  of 
Kaluga')  or  (b)  instances  where  the  proper  name  remains  in  the  nomi - 
native  (NA  OZEPS  $IL6MEN6  ‘on  lake  Il'men*  ’).  A  summary  of  cur¬ 
rent  usage  is  provided  in  F.ozental'  (1965;  253-256). 

46.  The  practical  difficulty  pointed  up  by  this  example,  which  becomes 
especially  pronounced  in  the  A1 2  subgroup  (constructions  with  nomen¬ 
clature  items),  is  that  even  though  a  detailed  subclasbification  can 
help  avoid  certain  difficulties,  the  size  of  the  dictionary  required  may 
easily  become  prohibitive.  If,  however,  proper  names  are  not  includ¬ 
ed  in  the  dictionary  wa  a  large  scale,  a  personal  name  cad  a  geographi¬ 
cal  name  alternative  may  have- to  be  considered  for  every  proper  name 
not  found  in  the  dictionary.  Taking  $VIARDQ  ('Viardot')  as  a  possible 
substitution  for  $VOL.GA  in  our  example,  the  following  interpretations 
are  possible. 

(a)  Beyond  the  mountain,  the  Viardot  becomes  cooler. 

(b)  Beyond  Mt.  Viardot  it  becomes  cooler. 

(c)  Beyond  Vlardo*  ,  mountain  it  becomes  cooler. 

(d)  Beyond  the  mountain,  Viardot  starts  to  feel  chillier. 

In  (a)  and  (b),  Viardot  is  treated  as  a  geographical  name;  in  (c)  and  (d) 
as  a  personal  name. 

47.  See  Bylinskii  and  Nikol'skii  (1957;  60).  Possibly  this  subgroup  of  ap¬ 
positions  should  be  considered  as  part  of  the  A12  subgroup  (construc¬ 
tions  with  nomenclature  items). 

4fi.  The  plural  can  occur  only  when  one  of  the  constituents  is  a  compour—i 
constituent  --  a  situation  not  considered  here. 

49.  Nomenclature  items  are  frequently  given  enclosed  in  quotation  marks 
(see  the  type  of  apposition  shown  in  the  second  example  in  note  29). 
Criteria  defining  each  option  are  not  easy  to  recognize  mechanically 
and  the  two  options  should  be  considered  possible.  However,  since 
the  use  of  appositives  enclosed  in  quotation  marks  is  not  discussed  in 
this  report,  only  the  other  option  is  considered. 

50.  Comments  made  in  notes  39  and  46  apply.  In  technical  texts,  appoei- 
tive  relations  frequently  are  formally  indiscernable  from  those  of 
government.  For  instance,  in  a  description  of  the  functioning  of  the 
PT-1  semiconductor  triode,  the  following  references  were  encountered: 
TRIOD  $P$T-i  {'PT-i  triode')  ...  NA  BAZU  $P$T  -i  {'on  the  base  (of) 
PT-i  (triode)')  . . .  NA  KOLLEKTOR  $P$T-i  ('on  the  collector  (of) 
PT-i  (triode)'),  and  so  on.  Moreover,  if  $P$T-i  is  lagged  as  the  name 
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The  two  options  are;  (a)  agreement  in  ca.ee  between  the  common  none 
and  the  proper  name  (POD  GOK.ODOM  $KALUGOi  'near  the  city  of 
Kaluga ' }  or  (b)  instance?  where  the  proper  nance  remains  in  the  nomi¬ 
native  (NA  OZERE  $LL6MEN6  ‘on  lake  Il'men'  ’).  A  summary  of  cur¬ 
rent  usage  is  provided  in  Rozental'  (1965;  253-256). 

The  practical  difficulty  pointed  up  by  this  example,  which  becomes 
especially  pronounced  in  the  A12  subgroup  (constructions  with  nomen¬ 
clature  items),  is  that  even  though  a  detailed  subclassification  can 
help  avoid  certain  difficulties,  the  size  of  the  dictionary  required  may 
easily  btdome  prohibitive.  If,  however,  proper  names  are  not  includ¬ 
ed  in  the  dictionary  on  a  large  scale,  a  personal  name  and  a  geographi¬ 
cal  name  alternative  may  have  to  be  considered  for  every  proper  name 
not  found  in  die  dictionary.  Taking  $YIARDO  (‘Viardot')  as  a  possible 
substitution  for  $  VOLGA  in  our  example,  the  following  interpretations 
are  possible. 

(a)  Beyond  the  mountain,  the  Viardot  becomes  cooler. 

(b)  Beyond  Mil  Viardot  it  becomes  cooler. 

(c)  Beyond  Viardot's  mountain  it  becomes  cooler. 

(d)  Beyond  the  mountain,  Viardot  starts  to  feel  chillier. 

In  (a)  and  (h),  Viardot  is  trea  ted  as  a  geographical  name;  in  (c)  and  (d) 
as  a  personal  name. 

See  Bylinskii  and  Nikol'skii  (1957;  60).  Possibly  this  subgroup  of  ap¬ 
positions  should  be  considered  as  part  of  the  A12  subgroup  (construc¬ 
tions  with  nomenclature  items). 

The  plural  can  occur  only  when  one  of  the  constituents  is  a  compound 
constituent  --  a  situation  not  considered  here. 

Noinenc>ature  items  are  frequently  given  enclosed  in  quotation  marks 
(see  the  type  of  apposition  shown  in  the  second  example  in  note  29). 
Criteria  defining  each  option  are  no(  easy  to  recognize  mechanically 
aad  the  two  options  should  be  considered  possible.  However,  since 
the  use  of  appocitives  enclosed  in  quotation  marks  is  not  discussed  in 
this  report,  only  the  other  option  is  considered. 

Comments  made  in  notes  39  and  46  apply.  In  technical  texts,  appe  ‘  - 
tive  relations  frequently  are  formally  indiscernible  from  those  of 
government.  For  instance,  in  ?.  description  of  the  functioning  of  the 
FT-i  semiconductor  triode,  the  following  references  were  encountered: 
TRIOB  $P$T-1  triode’)  .  . .  M A  BAZU  $P$T-i  (’on  the  base  (of) 

FT-1  (triode)')  ,  .  T1A  KOLLEKTOR  $P$T-t  ('on  the  collector  (of) 
FT-i  (triode)*},  and  so  oil  Moreover,  if  $P$T-i  is  tagged  as  the  name 


of  a  triode,  it  is  not  impossible  to  anticipate  a  construction  like  die 
following:  3 TOT  TIIIOD  $P$T-i  ZAMENIT6  NE  MOJET  (‘this  triodc 
cannot  replace  the  PT-1  ( triodc}' ). 

54.  This  restriction  is  introduced  in  order  to  avoid  redundant  analyses 
which  would  otherwise  result. 

52.  Among  the  other  constituents  are  such  items  as  NFNP,  produced  by  a  _ 
cuorulc  discussed  in  A9  of  Section  2.  i.  4  and  NPB  (cf-  subrule  (601  in. 

2.  i.  4C)< 

53.  The  ability  of  nouns  to  be  simultaneously  modified  by  adjectives  on  both 
sides  requires  further  study  as  part  of  the  larger  problem  of  modifier 
strings.  As  noted  in  19.  a  variety  of  adjectival  strings  have  been  iden  ¬ 
tified  in  the  course  of  studies  carried  out  during  the  contract  period. 

It  is  important  to  distinguish  (a)  the  two  basic  forms  of  composition  cf 
adjectival  strings  (those  consisting  only  of  adjectives  versus  mixed 
strings  of  the  type  OSOBA4,  TEXMCESKOGO  POR4DKA,  PAUZA  ('a 
special  pause  of  technical  nature')};  (b)  the  relative  position  of  such 
strings  (pre-  versus  post-positional  with  respect  to  the  noun);  and  (c) 
obligatory  versus  optional  detachment.  Much  of  this  information  has 
to  be  obtained  from  a  direct  study  of  source  materials  because  the 
Available  references  provide  a  spetty  picture  and  linguistic  intuitions 
are  inadequate  to  anticipate  all  of  tbe  logically  possible  combinations. 

54.  Formally,  agreement  in  number  is  affected.  Some  of  the  more  obvious 
instances  are  suggested  in  Rozental'  (i^65;  247-52).  Tentative  recog¬ 
nition  rules  covering  the  various  possibilities  have  been  worked  out. 
However,  they  could  only  be  tested  manually  and  are  not  discussed  in 
this  report, 

55.  The  usage  is  not  settled  in  Modern  Russian.  The  information  shown  in 
Table  II-9  is  incomplete  (instances  involving  adjectives  of  the  type 
QELYi  ('whole')  and  numerals  of  the  type  POL T ORA  ('one  and  one- 
half')  are  not  considered).  Instances  marked  as  non-standard  (prepo¬ 
sition  of  adjectives  to  numerals  of  the  SI  type,  e.g.,  twenty-one)  are 
not  uncommon.  Cf.,  e.g.,  the  problems  with  plura.lia  tap  turn  nouns 
which  cannot  combine  with  SI,  S2,  or  S3  (e.g.,  21,  42,  or  73)  numer¬ 
als  requiring  nouns  in  the  singular  (*SCROK  DVOE  NOJNIQ  'forty-two 
scissors')  but  do  combine  with  all  other  numerals  requiring  (genitive) 
plural  (SOROK  F4T6  NOJNIQ  'forty-five  scisscrsyl.  For  additional 
references,  see  Suprun  (1964;  8i-d),  Listvinov  (1965;  i68-70),  R<> cen¬ 
tal*  (1965;  243-6). 

56.  la  addition  to  the  Academy  Grammar  sections  on  v/ord  groups  (Vol.  Il.i; 
113-353),  relevant  sections  on  prepositions  and  preposition-noun 


phrases  in  Peshkovskii  (1956)  and  Vinogradov  (1947)  were  used  as 
source  materials. 

57.  Predications  are  named  according  to  the  type  of  sentences  they  produce. 

58.  Standard  references  are  the  Academy  Grammar  (Vinogradov  (I960))  and 
the  Moscow  State  University  Grammar  (Galkina -Fedora*:  (1964)}.  Iu  ad- 
di tion,  Gvosdev  (1952),  Rudnev  (1963),  Peshkovskii  (1956),  and  Valgina 
et  al.  (1962),  among  others,  were  regularly  consulted. 

59.  In  addition,  style  manuals  like  Rozental1  (1965),  specialized  studies, 
specifically  those  of  Ebeling  (i?5S)  £=*?  Oil*ehei»ok  (1964).  andnamer- 
ou8  other  sources  have  been  studied.  While  there  is  a  great  deal  of 
repetition  of  basic  facts,  useful  insights  can  be  gained  from  these 
sources  regarding  individual  facets  of  the  problem. 

60.  In  order  to  cope  effectively  with  the  problem  of  subject-predicate 
agreement,  it  is  necessary  to  consider  additional  features  discussed 
in  Section  2.  2.  2. 

61.  See  Galkina -Fedoruk  (1964);  also  Galkina  -  F edoruk  (1958). 

62.  One  of  the  better  discussions  of  this  topic  is  found  in  Kolshanskii 
(1965;  480-5). 

63.  Thi?  was  not  consistently  carried  through  all  of  the  rules.  The  dis¬ 
crepancies  were  only  partially  corrected. 

64.  One  of  the  better  descriptions  is  to  be  found  in  Gvozdev  (1952),  later 
rewo.-ked  in  Gvozdev  (1964).  In  many  regards  the  use  of  coordinativ* 
compounds  is  affected  by  considerations  of  style  and  hence  consider  id 
a  borderline  grammatical  problem. 

65.  This  distinction  is  elaborated  in  BylinBkii  and  Rozenta.1'  (1959;  30). 

The  concept  i  i-inorodnost*  ('homogeneity,  uniformity1)  as  it  applies 
to  modifiers,  for  instance,  is  pcstic"ally  iffected.  Hence  in  NOVA- 
TOR  PRCSZVGDSTVA  TOKAR6  TOVAR15  $BORISOV  ('industrial 
innovator,  lathe  operator,  comrade  Borisov'),  the  preposed  apposi- 
tives  are  not  "homogeneous".  However,  in  postposition  they  are  and, 
r  e  evidenced  by  the  punctuation,  form  a  coordinative  compound: 
TOVARI5  IBOHISOV,  NOVATOR  FROIZVODSTVA,  TOKAR6,  Some 
interesting  insights  are  also  found  in  Golovin  (1959), 

66.  A  good  semantic  analysis  of  relations  expressed  by  coordinative  con¬ 
junctions  is  given  in  Gvozdev  (1952),  Bylinskii  and  Rozental'  (1959) 
and  Figurovskii  (1961;  12-16).  Materials  from  these  atid  other  sources 
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(for  instance,  Shapkin  (4964))  were  collected  but  are  not  included  in 
this  report  for  reasons  of  apace  limitations. 

Such  expressive  usage  is  not  considered  here,  although  the  provision 
for  this  possibility  exists  in  subrule  (64).  Conjunctions  like  "l"  ('and') 
should  be  considered  as  having  two  alternatives:  (a)  conjunctions  used 
once  and  (b)  iterative  conjunctions. 

Paired  conjunction  constructs  should  be  identified  as  being  the  first 
and  the  second  pari  of  a  compound.  This  distinction  is  not  expressed 
in  the  simplified  rules  that  follow.  The  provisions  necessary  to  en¬ 
sure  appropriate  lexical  correspondence  among  members  of  paired, 
iterative,  and  repeated  single  and  paired  conjunctions  have  also  been 
omitted  from  these  rules. 

Typical  instances  are  given  in  the  official  rules  of  Russian  orthography 
and  punctuation  (Dobromyslov  (1957)).  More  difficult  cases  are  dealt 
with  in  Bylinskii  and  Rozental*  (4959)  and  in  Shapiro  (4966). 

The  punctuation  of  this  particular  sentence  is  open  to  question  and  it 
may  be  argued  that  in  order  to  obtain  a  second  interpretation  a  comma 
should  follow  MUJCINY.  If  this  were  true,  then  the  interpretation 
corresponding  to  (a)  would  be  'Inhabitants  of  the  city  (men)  and  women 
.  . .  awaited. . . r. 

The  section  on  subordinate  clauses  (Yol.  H,  2;  266-380)  was  used  in 
conjunction  with  the  discussion  of  subordinate  conjunctions  in  Vol.  I. 

Cf.  International  Business  Macnines  Corporation  (4964). 

Although  the  examples  may  have  additional  meanings,  only  those  speci¬ 
fied  should  be  considered. 

Used  here  in  the  sense  described,  for  instance,  by  Nida  (I960;  45-8). 

For  additional  discussion,  see  Vinogradov  (I960),  especially  p.  239 
in  Vol.  IL  1 . 

See  Maslov  (1965),  especially  pp.  70-79,  for  a  brief  survey  of  earlier 
work  and  an  illustration  of  twenty -five  Aktionsart  distinctions  cur¬ 
rently  recognized. 

For  further  details,  see  appropriate  sections  of  the  references  given 
in  note  58. 
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IK.  PREDICTIVE  SYNTACTIC  ANALYSIS  OF  RUSSIAN 


Introduction 


In  parallel  with  the  work  on  CSA  described  in  Sections  I  and  II  of  this 
report,  a  small  study  was  conducted  on  predictive  syntactic  analysis  of  Rus¬ 
sian.  The  principal  accomplishments  in  this  latter  area  were:  (1)  modifica¬ 
tion  of  the  multiple-path  predictive  Russian  Syntactic  Analyzer  to  bring  it  in¬ 
to  operational  aiatus  at  the  IBM  Research  Computing  Center  (Section  3.  1); 

(2)  expansion  and  revision  of  the  dictionary  for  the  Analyzer  with  emphasis 
on  the  syntactic  properties  of  high-frequency  function  words  (Section  3.  2); 
and  (3)  testing  and  evaluation  of  the  performance  of  the  Analyzer  on  a  sample 
of  several  thousand  words  of  modern  Russian  text  (Section  3.  3). 


Additional  activities  undertaken  in  connection  with  this  portion  of  the 
project  included  a  study  of  some  of  the  recent  literature  on  transformational 
grammar  of  English  --  in  particular,  Lakoff  (1965)  and  Rosenbaum  (1967) 
as  a  source  o i  potential  insights  into  problems  of  Russian-English  structural 
transfer.  Unfortunately,  since  the  time  and  programming  effort  required  to 
make  the  Analyzer  fully  operational  turned  out  to  be  substantially  greater 
than  originally  anticipated,  the  output  of  the  Analyzer  (the  second  major  in¬ 
gredient  of  the  proposed  structural  transfer  study)  was  not  available  until  the 
end  of  the  contract  period.  Accordingly,  this  portion  of  the  work  on  predic¬ 
tive  analysis  did  not  progress  sufficiently  to  yield  reportable  results. 


3. 1  Programming  Activities 

The  piesent  section  summarizes  programming  activities  carried  cut 
in  support  of  predictive  syntactic  analysis  of  Russian.  The  central  focus  of 
these  activities  was  the  process  of  bringing  the  multiple-path  predictive  Rus¬ 
sian  Syntactic  Analyzer  --an  exhaustive  sentence  parsing  system  originally 
developed  at  Harvard  (Plath,  19t»3)  --  into  operational  status  at  the  IBM  Re¬ 
search  Computing  Center,  in  order  to  provide  an  appropriate  framework  for 
the  discussion,  it  will  be  helpful  first  to  consider  briefly  the  organi:9t^n  of 
the  ay  a  ter>  and  the  functions  of  its  major  components. 

As  can  be  observed  from  Figure  III—  1 ,  within  this  parsing  system  the 
process  of  obtaining  syntactic  analyses  for  the  sentences  of  a  Russian  text 
that  has  been  transcribed  onto  punched  cards  requires  the  sequential  execu¬ 
tion  of  thrte  programs:  the  PRE-ANALYZER,  SETSEN,  and  SYNTAX.  The 
PRE-ANALY'ZER  begins  by  performing  a  variety  of  preliminary  operations 
on  the  input  text,  including  serialization,  code  conversion,  and  formatting. 
When  these  steps  have  been  completed,  the  program  proceeds  to  assign  each 
word  in  the  text  a  set  of  syntactic  alternatives  with  associated  English 
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correspondents  through  a  process  of  dictionary  lookup.  The  resultant  out¬ 
put.  known  as  the  Preanalyzed  Text  Tape,  serves  as  input  to  SETSEN.  The 
primary  function  of  the  latter  program  is  to  convert  ECD  codes  for  syntactic 
alternatives  into  a  considerably  more  compact  bic-^y  representation  compa¬ 
tible  with  that  used  on  the  Grammar  Tape.  In  addition  to  a  Binary  Sentences 
Tape,  which  contains  the  compressed  codes  for  the  syntactic  alternatives, 
SET5EN  also  produces  a  BCD  Sentences  Tape  containing  the  original  BCD 
codes  together  with  their  associated  English  correspondents.  These  two 
tapes,  along  with  the  aforementioned  Grammar  Tape,  serve  as  inputs  to  . 
SYNTAX,  the  predictive  analysis  program  proper.  SYNTAX  systematically 
applies  the  rules  of  the  Russian  predictive  grammar  '  o  each  sentence  on  the 
Binary  Sentences  Tape  according  to  the  multiple-path  predictive  analysis 
algorithm  whose  details  are  presented  in  Plath  (1963).  The  resultant  output 
contains  all  surface  structure  analyses  of  each  sentence  that  are  consistent 
with  the  current  grammar  and  the  current  dictionary.  As  each  analysis  ia 
completed,  SYNTAX  edits  it  and  writes  it  out  in  printable  form,  making  use 
of  the  appicpriate  material  from  the  BCD  Sentences  Tape. 


•_  In  addition  to  the  main  sequence  of  programs  just  described,  the 
predictive  analysis  system  for  Russian  includes  facilities  for  grammar  and 
dictionary  maintenance.  Dictionary  updating  is  performed  by  the  UPDATE 
DiC  program,  which  produces  the  dictionary  tape  file  employed  by  the  lookup 
phase  of  chePRE- ANALYZER.  The  tape  file  can  be  listed  (with  the  Russian 
word  forma  transliterated)  by  using  the  PRINT  DIG  program.  The  grammar 
file,  maintained  ia  BCD  form  on  cards,  is  converted  into  a  more  compact 
binary  equivalent  by  the  SETGRM  program,  which  also  produces  a  serialized 
BCD  listing  of  the  rules.  Both  the  UPDATE  DIC  and  the  SETGRM  programs 
include  extensive  provisions  for  the  detection  of  format  errors  and  invalid 
codes. 


The  first  task  performed  in  making  the  predictive  analysis  system 
operational  at  the  IBM  Research  Computing  Center  was  that  of  modifying  ail 
of  its  component  programs  to  render  them  compatible  with  the  local  .FMS II 
system.  In  most  cases  this  was  a  relatively  simple  task,i  involving  ohly 
minor  modifications  such  as  the  alteration  of  tape  assignments.  For  the 
programs  with  large  storage  requirements  (the  PRE-ANALYZER  and  SYN¬ 
TAX),  however,  there  were  serious  problems  of  storage  overlap  with  FMSII 
system  routines,  These  problems  were  eventually  resolved  with  some  diffi¬ 
culty  at  the  price  of  doing  without  certain  of  the  service  programs  normally 
ev&ilabla  under  the  system. 


Once  the  predictive  analysis  programs  had  been  made  compatible 
with  the  local  FMSII  system,  it  became  possible  to  test  them  out  on  a  large 
scale.  While  no  difficulties  were  encountered  in  running  SYNTAX  and  SET¬ 
GRM,  which  had  been  extensively  tested  at  Harvard  oyer  a  considerable 
period  of  time,  significant  errors  were  discovered  both  in  SETSEN  and  in 


the  dictionary  lookup  routine  of  the  PRE -ANALYZER,  which  bad  previously 
been,  checked  out  only  on  very  limited  samples  of  text  Beyond  the  modifi¬ 
cations  necessary  to  correct  these  errors,  a  number  of  other  changes  were 
made  to  -ihe  predictive  analysis  system  in  order  to  facilitate  its  employment 
for  present  research  purposes.  These  latter  changes  included:  (1)  exten¬ 
sion  of  the  PRE-ANALYZER  to  permit  acceptance  of  input  text  keypunched 
for  the  GSA  system  (Section  1);  (2)  extension  of  the  dictionary  update  package 
through  development  of  a  program  for  the  automatic  generation  of  update 
control  cards  for  entries  to  be  added  to  the  dictionary  file;  and  (3)  improve¬ 
ment  of  the  facilities  for  handling  error  conditions  in  SYNTAX. 

In  addition  to  the  programming  activites  reported  on  above,  which 
directly  involved  the  predictive  analysis  system,  a  number  of  small  support 
programs  were  written  during  the  contract  period  to  perform  such  varied 
functions  as  tape  editing,  damping,  and  compilation  of  dictionary  update 
statistics. 


3.  2  Expansion  and  Revision  of  the  Dictionary 

During  the  contract  period,  the  dictionary  lor  the  Russian  predictive 
analyzer  underwent  considerable  expansion,  as  well  ae  a  moderate  amount 
of  revision.  The  expansion  process  had  two  main  objectives:  the  first  was 
tc  provide  syntactic  coding  and  English  correspondents  for  all  lexical  items 
occurring  in  the  160-sentence  sample  of  Pravda  editorials  (Section  2.  1), 
thereby  making  it  possible  to  employ  the  sample  in  testing  the  grammatical 
coverage  of  the  system  as  a  whole;  the  second  was  to  extend  the  coverage  of 
the  dictionary  to  include  additional  high-frequency  function  words  which 
might  be  expected  to  occur  in  a  wide  variety  of  texts. 

Fulfillment  of  the  first  objective  turned  out  to  be  a  major  undertaking, 
involving  consultation  of  a  variety  of  dictionaries  and  grammars  in  order  to 
determine  the  appropriate  syntactic  coding  for  each  of  over  4500  lexical 
items.  Once  this  task  had  been  completed,  however,  fulfillment  of  the  sec¬ 
ond  objective  was  relatively  easy,  in  that  less  than  sixty  additional  items 
had  to  be  processed  in  order  to  account  for  all  high-frequency  function  words 
appearing  in  either  of  the  two  sources  employed  for  word  frequency  data: 

(1)  Kozak's  list  of  approximately  1000  high-frequency  word  forms  in  Hessian 
physics  (Kozak,  1962)  and  (2)  the  1000  most  frequent  word  forms  in  the  Air 
Force  corpus  described  in  Section  2.  3.  In  the  total  expansion  process,  the 
dictionary  grew  from  a  1401 -entry  file  covering  940  distinct  forms  to  a 
3  315 -entry  file  covering  2508  distinct  fcrms  --a  total  increase  in  coverage 
of  1563  word  forms. 

Moat  of  the  revisions  to  the  dictionary  were  made  following  the  first 
complete  analysis  run  on  the  entire  160-sentence  Pravda  sample.  The  great 


ixiajoi  ity  of  the  mod;  flea  tionis  involved  new  entries  whose  syntactic  coding 
hod  bo -n  found  in  the  course  of  analysis  to  be  either  incorrect  or  incomplete. 
A  handful  of  the  original  entries  were  modified  for  'similar  reasons,  while  a 
somewhat  greater  number  of  them  (primarily  thoa  ej  for  numerals  and  quan¬ 
tifier’.}  had  their  sets  of  syntactic  alternatives  revised  to  correspond  to 
change 3  adopted  for  new  entries  with  similar  syntactic  properties. 

3.  3  Performance  of  the  Analyzer  on  the  Test  Sample 

Two  complete  analysis  runs  were  made  on  the  160-aentence  test 
sample  of  Pravda  editorials,  the  first  before  the  revision  of  the  dictionary, 
and  the  second  immediately  thereafter.  On  the  first  run,  one  or  more  anal¬ 
yses  were  obtained  for  only  69  of  the  160  sentences';  the  rest  had  no  analyses. 
Furthermore,  of  the  69  sentences  for  which  some  analysis  was  obtained, 
only  id  had  an  atialysis  which  was  judged  to  be  completely  correct.*  On  the 
second  run,  the  results  were  somewhat  better,  owing  to  an  improvement  of 
the  grammatical  coding  in  the  dictionary:  at  least  one  parsing  was  obtained 
fo  r  each  of  94  sentences,  78  of  which  had  an  analysis  that  was  considered 
completely  correct.  *■ 

The  results  of  the  second  run  are  summarized  in  greater  detail  in 
Table  III- 1 ,  which  displays  for  each  sentence  its  serial  number,  length  in 
words,  number  of  analysis  segments,  error  type  (if  any),  and  running  time 
in  minutes.  With  regard  to  performance  in  terms  of  running  time  for  SYN¬ 
TAX,  the  entire  sample  (3230  words)  took  248.  69  minutes  to  process,  or 
slightly  in  excess  of  four  hours.  As  can  be  seen,  analysis  of  some  of  the 
longest  sentences  was  extremely  time-consuming,  while  processing  of  sen¬ 
tences  of  average  length  (twenty  words)  generally  took  only  a  few  seconds. 
Had  the  seven  sentences  of  length  greater  than  40  words  (i.e.,  sentences  34, 
42,  46,  63.  88,  91,  and  158)  been  eliminated,  the  remainder  of  the  text  (2883 
words)  would  have  been  parsed  in  the  more  reasonable  span  of  76.  06  minutes. 

_  _  t  _ 

♦The  criteria  employed  in  deciding  what  constituted  a  "completely  correct" 
parsing  are  in  part  subjective  and  hence  somewhat  difficult  to  describe. 

The  basic  requirement  was  that  the  given  analysis  represents  a  surface 
structure  corresponding  to  the  "normal"  interpretation  of  the  sentence 
with  regard  to  such  matters  as:  (1)  the  selection  of  syntactic  alternatives; 

(2)  the  identity  of  grammatical  subjects,  objects,  'and  complements  within 
clauses;  (3)  the  correct  correspondence  of  modifiers  with  heads,  relative 
pronouns  with  antecedents,  and  members  of  coordinate  constructions  with 
one  another;  and  (4)  nesting  of  infinitives  and  clauses  within  other  clauses. 
'One  major  systematic  weakness  of  many  of  the  "correct"  analyses  ob¬ 
tained  by  the  system  is  the  treatment  of  molt  prepositional  phrases  and 
"gome  adverbs  as  "floating  structures",  I  i.e.,  as  structures  whose  syntactic 
relationship  to  the  rest  of  the  sentence  is  completely  unspecified. 
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Table  III-l :  Summary  of  Predictive  Analyzer  Performance 
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Aside  from  making  fundamental  improv  smetus  in  the  parser ,  perhaps  along 
the  lines  proposed  fay  Knno  (1965),  tLe  problem  of  running  time  could  be 
substantially  reduced  either  by  imposition  of  an  arbitrary  upper  limit  on  the 
length  of  sentences  processed  or  by  running  a  successor  to  the  present  pro¬ 
gram  on  an  interactive  system. 

Although  the  performance  of  the  Analyzer  in  terms  of  running  time 
was  generally  consistent  with  th-  t  observed  earlier  for  a  sample  of  scien¬ 
tific  text  (Plath,  1963;  Thorpe,  1964),  there  was  a  significant  degradation 
in  performance  with  regard  to  the  percentage  of  sentences  for  which  correct 
analyses  were  obtained.  Far  the  sample  of  scientific  text,  66  of  1 3  sen¬ 
tences,  or  about  90  per  ceut,  received  a  correct  analysis;  whereas,  in  the 
present  test,  the  per  .enfage  of  sentences  with  correct  parsings  was  slightly 
below  50  per  cent.  The  explanation  for  this  large  discrepancy  clearly  does 
not  lie  in  the  change  of  vocabulary  brought  about  by  the  shift  from  scientific 
text  to  editorials,  since  deficiencies  in  grammatical  coding  for  the  new  lexi¬ 
cal  items  were  largely  eliminated  in  the  dictionary  revision  carried  out 
prior  to  the  second  run.  Moreover,  although  the  random  method  of  sentence 
selection  may  have  led  to  greater  than  average  syntactic  diversity  in  the  re¬ 
sulting  test  sc  mple,  examination  the  output  indicates  that  this  effect  was 
not  sufficiently  pronounced  to  account  for  more  than  a  small  part  of  the  ob¬ 
served  discrepancy.  Instead,  as  will  become  evident  in  the  discussion  of 
error  types  below,  the  principal  effect  of  the  shift  from  scientific  text  to 
editorials  was  the  employment  with  great  frequency  of  one  particular  con¬ 
struction  not  provided  for  in  the  Russian  predictive  grammar. 

Except  for  a  few  instances  of  incorrect  handling  of  some  of  the  typo¬ 
graphical  features  of  the  input  text,  each  of  the  errors  detected  involved 
failure  to  provide  coverage  for  one  or  another  feature  of  the  surface  syntax 
of  the  sentence  in  question.  In  an  attempt  to  focus  on  recurrent  problems, 
the  errors  have  been  grouped  together  into  the  following  six  types:  Type  A. 

-  instances  of  asyndetic  coordination;  Type  B  -  new  case  constructions; 

Type  C  -  new  patterns  of  word  order  or  nesting;  Type  D  -  new  agreement 
relationships;  Type  E  -  input  errors;  and  Type  F  -  miscellaneous. 

Type  A  errors,  involving  instances  of  asyndetic  coordination,  were 
by  far  the  most  frequent  for  the  Pravda  sample.  No  less  than  54  of  the  82 
sentences  with  no  completely  correct  analysis  (or  about  two-thirds  of  them) 
contained  at  least  one  coordination  of  this  type  --  a  construction  for  which 
there  is  no  provision  in  the  predictive  Russian  grammar.  Briefly,  an  asyn¬ 
detic  coordinative  construction  (cf.  Section  2.  i.  4C)  is  a  coordination  in 
which  the  components  are  linked  by  punctuation  (usually  commas  or  semi¬ 
colons),  but  where  no  conjunction  is  present.  Thus  we  have,  for  example, 
in  sentence  30  of  the  test  sample:  $UVEJLICEHIE  IX  ZAGRUZKI  SPOSOB- 
STVUET  ROSTU  PROIZVODITEL6NOSTI  TRUDA,  SNIJENIH  SEBESTOI- 
MOSTI  PRODUKQH,  UVELICENIH  NAKOPLENI1.  ('Increase  of  their  load 


promotes  growth  of  productivity  of  labor,  reduction  of  the  net  cost  of  pro¬ 
duction,  increase  of  accumulations.')  In  this  instance,  the  second  member  of 
the  coordinative  construction,  SNUENXH  ('reduction’),  was  incorrectly  anal¬ 
ysed  as  an  appositlve  to  the  first  member,  ROSTU  ('growth'))  while  the  third 
member,  UVELICENIH  ('increase'),  was  similarly  misinterpreted  as  stand¬ 
ing  in  apposition  to  SNIJENIH.  There  were  more  numerous  instances,  how¬ 
ever,  where  no  analysis  was  obtained  at  all,  either  because  the  components 
of  an  asyndetic  compound  noun  phrase  did  not  satisfy  the  agreement  require¬ 
ments  for  appositions,  or  because  the  asyndetic  coordination  in  question  in¬ 
volved  components  of  other  syntactic  types,  such  as  verbs  or  prepositional 
phrases. 

While  it  would  be  a  trivial  matter  to  alter  the  predictive  grammar  to 
provide  for  the  recognition  of  asyndetic  coordinations,  under  present  cir¬ 
cumstances  the  resulting  increase  in  both  the  number  of  spurious  analyses 
and  the  associated  processing  time  would  probably  assume  disastrous  pro¬ 
portions.  The  only  recourse  would  appear  to  be  that  of  waiting  until  such 
time  as  it  is  possible  to  formulate  much  more  stringent  restrictions  on  what 
constitutes  a  linguistically  acceptable  coordination  or  apposition.  In  the 
meantime,  it  is  somewhat  reassuring  to  note  that  scientific  writers  do  not 
seem  to  favor  the  use  of  asyndetic  coordination  to  the  extent  that  Pravda 
editorialists  do;  in  fact,  in  the  sample  of  scientific  text  processed  earlier 
(Flath,  1963), not  a  single  instance  was  found. 


Except  for  the  miscellaneous  category  (Type  F),  various  case  con¬ 
structions  not  provided  for  in  the  current  Russian  grammar  represented  the 
second  most  frequent  error  type,  with  15  occurrence*.  The  bulk  of  these 
errors  (12  occurrences)  involved  either  (a)  loosely  governed  instrumentals 
and  datives  of  reference,  not  predictable  from  the  verb,  or  (b)  uses  of  the 
genitive  peculiar  to  date  constructions  —  e.g.,  from  sentence  69,  .  .  .  MAR- 
TOYSKI1  I  SENT4BR6SKI1  ($$1965  GODA)  PLENUMY  ...  {'.  . .  the  March 
and  September  (of  the  year  1965)  plenums. .  .  ').  The  remaining  three  in¬ 
stances  involved:  (a)  employment  of  the  nominative  case  as  a  "vocative" 
(sentence  105)  --  $VDUMA1S4,  TOVARI5,  V  QIFRY  PRIYEDENNYE  V 
SOOB5ENII  $Q$3$U.  ('Think,  comrade,  of  the  figures  presented  in  the 
communication  of  the  Central  Statistical  Bureau.');  (b)  use  of  the  nomina¬ 
tive  in  a  quoted  title  in  apposition  to  a  noun  in  another  case  (sentence  109) 

--  .  .  .  PO  CAZETE  "$STAVROPOL6SKA4  PRAVDA".  .  .  ('. .  .  according  to  the 
newspaper  "Stavropol1  skaia  Pravda". .  .');  and  (c)  use  of  the  genitive  for  what 
would  normally  be  the  subject  of  a  negated  passive  verb  (sentence  111)  -- 
.  .  .  V  R4DE  MEST  NE  PRINIMAETS4  VSEX  MER  K  .  .  .  ('.  .  .  in  a  number 
of  places  not  all  measures  are  being  taken  towards  ...'). 


There  are  12  instances  of  errors  involving  failure  tc  account  for 
various  word  order  and  nesting  arrangements  (Type  C).  Five  of  them  in¬ 
volved  constructions  in  which  both  members  of  a  compound  verb  had  a 

154 


common  object,  e.g.  (sentence  35),  .  UTirERJDAHT  I  RAZVIVAHT 

LENINSKIE  NORMY  . ...  ('. . .  affirm  and  develop  Leninist  standards  ...'). 
While  such  constructions  could  be  handled  within  the  framework  of  the  pre¬ 
dictive  grammar  without  any  of  the  unpleasant  consequences  that  would  en¬ 
sue  upon  the  elimination  of  Type  A  errors,  «  substantial  increase  in  the 
number  of  subrules  for  verbal  constructions  would  be  required  for  this  pur¬ 
pose.  The  remaining  Type  C  errors  stem  from  various  word  order  patterns 
not  covered  by  the  present  grammar.  The  majority  of  them  are  probably 
"normal"  enough  to  warrant  inclusion  of  appropriate  provisions  for  them  in 
the  grammar,  and  it  appears  that  very  few  additional  subrules  would  be  re¬ 
quired  for  this  purpose. 

Seven  agreement  errors  (Type  D)  were  detected  in  the  sample,  three 
of  them  involving  conjoined  adjectives  of  different  number,  and  the  remain¬ 
der  having  to  do  with  plural  agreement  of  singular  nouns  which  can  be  con¬ 
sidered  to  denote  entities  consisting  of  a  number  of  components,  i.e. ; 

GAST6  ('part'  -  sentence  90),  BOL6WINSTVO(,majorityl  -  sentence  113), 
OSNOVA  ('foundation1  -  sentence  123),  and  MASSA  ('mass'  -  sentence  140). 
Input  errors  (Type  E)  involved  the  loss  of  colons,  semicolone,  or  paren¬ 
theses  in  six  sentences.  Finally,  seventeen  sentences  had  errors  of  miscel¬ 
laneous  types,  ranging  from  failure  to  detect  instances  of  substantivisation 
and  ellipsis  to  non-recognition  of  certain  punctuation  patterns  and  discon¬ 
tinuous  idioms. 

In  addition  to  those  inadequacies  of  grammatical  coverage  explicitly 
recorded  in  the  form  of  error  type  notations  in  Table  UI-1,  other  short¬ 
comings  are  implicit  in  the  numerous  instances  where  multiple  analyses 
were  obtained.  The  total  of  279  analysis  segments  works  ot?t  to  an  average 
of  about  three  analysis  segments  per  sentence  if  only  the  94  sentences  which 
actually  were  given  analyses  are  considered.  This  average  compares  fa¬ 
vorably  with  the  figure  of  approximately  four  analysis  segments  per  sentence 
for  the  sample  of  scientific  text;  the  reduction  is  largely  attributable  to  the 
adoption  of  some  of  the  recommendations  made  in  the  earlier  study  for  alle¬ 
viation  of  the  problem  of  multiple  analyses  (Plath,  1963:  4-134  ff.). 

Despite  the  reduction  in  the  number  of  incorrect  analyses,  the  resi¬ 
due  (an  average  of  two  incorrect  analyses  for  each  correct  one)  is  still  very 
large.  Examination  of  the  output  for  the  test  sample  indicates  that  much  of 
the  difficulty  with  multiple  analyses  is  traceable  to  the  continuing  lack  of 
adequate  linguistic  definitions  of  what  constitutes  either  an  acceptable  coor¬ 
dination  or  an  acceptable  apposition,  a  point  alrsac.y  mentioned  in  the  dis¬ 
cussion  of  Type  A  errors,.  Moreover,  as  indicated  in  the  same  discussion, 
attempts  to  extend  grammatical  coverage  in  certain  directions,  such  as  that 
of  handling  asyndetic  coordination  (or  of  tying  prepositional  phrases  to 
specific  governors),  would  seriously  increase  the  number  of  undesirable 
analyses,  given  the  present  status  of  linguistic  description  of  Russian, 


A  review  of  the  results  of  the  present  experiment  suggests  the  fol¬ 
lowing  two  observations.  The  first  is  that,  assuming  that  the  results  ob¬ 
tained  are  not  entirely  unrepresentative  of  the  current  state  of  the  art,  auto¬ 
matic  syntactic  analysis  of  Russian  sentences  mubt  still  be  regarded  to  be  in 
its  experimental  stages,  and  hence  not  yet  ready  for  employment  in  text  pro¬ 
cessing  applications.  Although  much  w6rk  remains  to  be  done  on  the  compu¬ 
tational  side,  particularly  if  a  capability  for  transformational  recognition  is 
desired,  it  appears  that  the  more  fundamental  current  obstacles  to  practical 
text  processing  application  stem  from  unsolved  linguistic  problems  of  the 
sort  alluded  to  in  the  preceding  discussion.  The  second  observation  is  sug¬ 
gested  in  part  by  the  sharp  differences  in  syntactic  pattern  observed  between 
the  sample  of  Pravda  editorials  and  the  sample  of  scientific  text.  While  the 
prospect  of  constructing  a  huge  array  of  interlocking  "microgrammars"  in 
order  to  handle  texts  of  various  types  is  an  extremely  uninviting  one,  the 
possibility  of  constructing  a  more  restrictive  grammar  adequate  for  a  single 
specific  field  appears  worthy  of  serious  exploration. 
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